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AMD-K6®-2 Processor 





Advanced 6-Issue RISC86® Superscalar Microarchitecture 
Ten parallel specialized execution units 

Multiple sophisticated x86-to-RISC86 instruction decoders 
Advanced two-level branch prediction 

Speculative execution 

Out-of-order execution 

Register renaming and data forwarding 


¢-¢ © © 6 OhUDlUM 


Issues up to six RISC86 instructions per clock 

Large Internal Split 64-Kbyte Level-One (L1) Cache 

@ 32-Kbyte instruction cache with additional 20-Kbytes of predecode cache 

@ 32-Kbyte writeback dual-ported data cache 

« Two-way set associative 

« MESI protocol support 

3DNow!™ Technology 

« Additional instructions to improve 3D graphics and multimedia performance 
« Separate multiplier and ALU for superscalar instruction execution 
Compatible with Super7™ platform 

« Leverages high-speed 100-MHz processor bus 

« Accelerated Graphic Port (AGP) support 

High-Performance IEEE 754-Compatible and 854-Compatible Floating-Point Unit 
High-Performance Industry-Standard MMX™ Instructions 

« Dual integer ALU for superscalar execution 

321-Pin Ceramic Pin Grid Array (CPGA) Package 

Industry-Standard System Management Mode (SMM) 

IEEE 1149.1 Boundary Scan 

x86 Binary Software Compatibility 


The innovative AMD-K6®-2 processor brings industry-leading performance to PC 
systems running the extensive installed base of x86 software. Its Super7™ 
compatible, 321-pin ceramic pin grid array (CPGA) package enables the processor to 
reduce time-to-market by leveraging today’s cost-effective industry-standard 
infrastructure to deliver a superior-performing PC solution. 
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The AMD-K6-2 processor is the first to incorporate 3DNow!™ technology, a significant 
innovation to the x86 processor architecture that drives today’s personal computers. 
With 3DNow! technology, new, more powerful hardware and software applications 
enable a more entertaining and productive PC platform. Improvements include fast 
frame rates on high-resolution scenes, superior modeling of real world environments 
and physics, life-like images and graphics, and big-screen sound and video. 


AMD has taken a leadership role in developing new instructions that enable exciting 
new levels of performance and realism. 3DNow! technology was defined and 
implemented in collaboration with Microsoft®, application developers, and graphics 
vendors, and has received an enthusiastic reception. It is compatible with today’s 
existing x86 software, is supported by industry-standard APIs, and requires no 
operating system support, thereby enabling a broad class of applications to benefit 
from 3DNow! technology. 


To provide state-of-the-art performance, the processor incorporates the innovative 
and efficient RISC86® microarchitecture, a large 64-Kbyte level-one cache (32-Kbyte 
dual-ported data cache, 32-Kbyte instruction cache with an additional 20-Kbytes of 
predecode cache), a powerful IEEE 754-compatible and 854-compatible floating-point 
execution unit, and a high-performance industry-standard multimedia execution unit 
for executing MMX™ instructions. The processor includes additional 
high-performance Single Instruction Multiple Data (SIMD) execution resources to 
support the 3DNow! technology. These techniques have been combined to deliver 
leading-edge performance on leading consumer and business applications in both the 
Microsoft Windows® 98 and Windows NT® operating environments. 


The AMD-K6-2 processor’s 6-issue RISC86 microarchitecture is a decoupled 
decode/execution superscalar design that implements state-of-the-art design 
techniques to achieve leading-edge performance. Advanced design techniques 
implemented in the AMD-K6-2 processor include multiple x86 instruction decode, 
single-clock internal RISC operations, ten execution units that support superscalar 
operation, out-of-order execution, data forwarding, speculative execution, and 
register renaming. In addition, the processor supports advanced branch prediction 
logic by implementing an 8192-entry branch history table, a branch target cache, and 
a return address stack, which combine to deliver better than a 95% prediction rate. 
These design techniques enable the AMD-K6-2 processor to issue, execute, and retire 
multiple x86 instructions per clock, resulting in excellent scaleable performance. 


The AMD-K6-2 processor is x86 binary code compatible. AMD’s extensive experience 
through six generations of x86 processors has been carefully integrated into the 
processor to enable compatibility with Windows 98, Windows 95, Windows 3.x, 
Windows NT, DOS, OS/2, Unix, Solaris, NetWare®, Vines, and other leading x86 
operating systems and applications. The AMD-K6-2 processor is Super7 and 
Socket 7-compatible. The Super7 platform is an extension to today’s popular and 
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robust Socket 7 platform. See “Super7™ Platform Initiative” on page 3 for more 
information. 


AMD is the world’s second-leading supplier of Windows-compatible PC processors, 
having shipped more than 120 million x86 microprocessors, including more than 60 
million Windows-compatible processors. With its combination of state-of-the-art 
features, industry-leading performance, high-performance 3DNow! technology and 
multimedia engines, x86 compatibility, and low-cost infrastructure, the AMD-K6-2 is 
the superior choice for mainstream personal computers. 


1.1 Super7™ Platform Initiative 


AMD and its industry partners launched the Super7 platform initiative in order to 
maintain the competitive vitality of the Socket 7 infrastructure through a series of 
enhancements, including the development of an industry-standard, 100-MHz 
processor bus protocol. 


In addition to the 100-MHz processor bus protocol, the Super7 initiative includes the 
introduction of chipsets that support the AGP specification, and support for a 
backside L2 cache and frontside L3 cache. Currently, over 40 motherboard vendors 
and all major BIOS and chipset vendors offer Super7 platform-based products. 


Super7™ Platform Enhancements 


The Super7 platform has the following enhancements: 


m 100-MHz processor bus—The AMD-K6-2 processor supports a 100-MHz, 800 
Mbyte/second frontside bus to provide a high-speed interface to Super7 
platform-based chipsets. The 100-MHz interface to the frontside Level 2 (L2) 
cache and main system memory speeds up access to the frontside cache and main 
memory by 50 percent over the 66-MHz Socket 7 interface—resulting in a 
significant increase of 10% in overall system performance. 


m Accelerated graphics port support—AGP improves the performance of mid-range 
PCs that have small amounts of video memory on the graphics card. The 
industry-standard AGP specification enables a 133-MHz graphics interface and 
will scale to even higher levels of performance. 


= Support for backside L2 and frontside L3 cache—The Super7 platform has the 
‘headroom’ to support higher-performance AMD-K6 processors, with clock speeds 
scaling to 550 MHz and beyond. The Super7 platform also supports the 
AMD-K6-III processor which features a full-speed, internal backside 256-Kbyte L2 
cache designed to enable new levels of performance to leading-edge desktop 
systems. This processor also supports an optional 100-MHz external L3 cache for 
even higher-performance system configurations. 
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Super7™ Platform Advantages 


The Super7 platform has the following advantages: 


Delivers performance and features competitive with alternate platforms at the 
same clock speed, and at a significantly lower cost 


Takes advantage of existing system designs for superior value 

Enables OEMs and resellers to take advantage of mature, high-volume 
infrastructure supported by multiple BIOS, chipset, graphics, and motherboard 
suppliers 

Reduces inventory and design costs with one motherboard for a wide range of 
products 

Builds on a huge installed base of more than 100 million motherboards 


Provides an easy upgrade path for future PC users, as well as a bridge to legacy 
users 


By taking advantage of the low-cost, mature Socket 7 infrastructure, the Super7 
platform will continue to provide superior value and leading-edge performance for 
desktop PC systems. 
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2 Internal Architecture 
2.1 Introduction 


The AMD-K6-2 processor implements advanced design 
techniques known as the RISC86 microarchitecture. The RISC86 
microarchitecture is a decoupled decode/execution design 
approach that yields superior sixth-generation performance for 
x86-based software. This chapter describes the techniques used 
and the functional elements of the RISC86 microarchitecture. 


2.2 AMD-K6®-2 Processor Microarchitecture Overview 


When discussing processor design, it is important to understand 
the terms architecture, microarchitecture, and design 
implementation. The term architecture refers to the instruction 
set and features of a processor that are visible to software 
programs running on the processor. The architecture 
determines what software the processor can run. The 
architecture of the AMD-K6-2 processor is the 
industry-standard x86 instruction set. 


The term microarchitecture refers to the design techniques used 
in the processor to reach the target cost, performance, and 
functionality goals. The AMD-K6 family of processors are based 
on a sophisticated RISC core known as the Enhanced RISC86 
microarchitecture. The Enhanced RISC86 microarchitecture is 
an advanced, second-order decoupled decode/execution design 
approach that enables industry-leading performance for 
x86-based software. 


The term design implementation refers to the actual logic and 
circuit designs from which the processor is created according to 
the microarchitecture specifications. 
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Enhanced RISC86" 
Microarchitecture 


The Enhanced RISC86 microarchitecture defines the 
characteristics of the AMD-K6 family. The innovative RISC86 
microarchitecture approach implements the x86 instruction set 
by internally translating x86 instructions into RISC86 
operations. These RISC86 operations were specially designed to 
include direct support for the x86 instruction set while 
observing the RISC performance principles of fixed length 
encoding, regularized instruction fields, and a large register 
set. The Enhanced RISC86 microarchitecture used in the 
AMD-K6-2 processor enables higher processor core 
performance and promotes straightforward extensions, such as 
those added in the current AMD-K6-2 processor and those 
planned for the future. Instead of directly executing complex 
x86 instructions, which have lengths of 1 to 15 bytes, the 
AMD-K6-2 processor executes the simpler and easier 
fixed-length RISC86 operations, while maintaining the 
instruction coding efficiencies found in x86 programs. 


The AMD-K6-2 processor contains parallel decoders, a 
centralized RISC86 operation scheduler, and ten execution 
units that support superscalar operation—multiple decode, 
execution, and retirement—of x86 instructions. These elements 
are packed into an aggressive and highly efficient six-stage 
pipeline. 


AMD-K6®-2 Processor Block Diagram. As shown in Figure 1 on page 
7, the high-performance, out-of-order execution engine of the 
AMD-K6-2 processor is mated to a split level-one 64-Kbyte 
writeback cache with 32 Kbytes of instruction cache and 32 
Kbytes of data cache. The instruction cache feeds the decoders 
and, in turn, the decoders feed the scheduler. The ICU issues 
and retires RISC86 operations contained in the scheduler. The 
system bus interface is an industry-standard 64-bit Super7 and 
Socket 7 demultiplexed bus. 


The AMD-K6-2 processor combines the latest in processor 
microarchitecture to provide the highest x86 performance for 
today’s personal computers. The AMD-K6-2 processor offers 
true sixth-generation performance and x86 binary software 
compatibility. 
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Figure 1. AMD-K6-2 Processor Block Diagram 











Decoders. Decoding of the x86 instructions begins when the 
on-chip instruction cache is filled. Predecode logic determines 
the length of an x86 instruction on a byte-by-byte basis. This 
predecode information is stored, along with the x86 
instructions, in the instruction cache, to be used later by the 
decoders. The decoders translate on-the-fly, with no additional 
latency, up to two x86 instructions per clock into RISC86 
operations. 


Note: In this chapter, “clock” refers to a processor clock. 


The AMD-K6-2 processor categorizes x86 instructions into three 
types of decodes—short, long, and vector. The decoders process 
either two short, one long, or one vector decode at a time. The 
three types of decodes have the following characteristics: 


ms Short decodes—x86 instructions less than or equal to seven 
bytes in length 


= Long decodes—x86 instructions less than or equal to 11 
bytes in length 


m Vector decodes—complex x86 instructions 
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Short and long decodes are processed completely within the 
decoders. Vector decodes are started by the decoders and then 
completed by fetched sequences from an on-chip ROM. After 
decoding, the RISC86 operations are delivered to the scheduler 
for dispatching to the executions units. 


Scheduler/Instruction Control Unit. The centralized scheduler or 
buffer is managed by the Instruction Control Unit (ICU). The 
ICU buffers and manages up to 24 RISC86 operations at a time. 
This equals from 6 to 12 x86 instructions. This buffer size (24) is 
perfectly matched to the processor’s six-stage RISC86 pipeline 
and four RISC86-operations decode rate. The scheduler accepts 
as many as four RISC86 operations at a time from the decoders 
and retires up to four RISC86 operations per clock cycle. The 
ICU is capable of simultaneously issuing up to six RISC86 
operations at a time to the execution units. This consists of the 
following types of operations: 


Memory load operation 

Memory store operation 

Complex integer, MMX or 3DNow! register operation 
Simple integer, MMX or 3DNow! register operation 
Floating-point register operation 


Branch condition evaluation 


Registers. When managing the 24 RISC86 operations, the ICU 
uses 69 physical registers contained within the RISC86 
microarchitecture. 48 of the physical registers are located ina 
general register file and are grouped as 24 committed or 
architectural registers plus 24 rename registers. The 24 
architectural registers consist of 16 scratch registers and 8 
registers that correspond to the x86 general-purpose registers — 
EAX, EBX, ECX, EDX, EBP, ESP, ESI, and EDI. There is an 
analogous set of registers specifically for MMX and 3DNow! 
operations. There are 9 MMX/3DNow! committed or 
architectural registers plus 12 MMX/3DNow! rename registers. 
The 9 architectural registers consist of one scratch register and 
8 registers that correspond to the MMX registers (mm0-mm7), 
as shown in Figure 17 on page 29. 


Branch Logic. The AMD-K6-2 processor is designed with highly 
sophisticated dynamic branch logic consisting of the following: 
m Branch history/Prediction table 

m Branch target cache 

= Return address stack 
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The AMD-K6-2 processor implements a two-level branch 
prediction scheme based on an 8192-entry branch history table. 
The branch history table stores prediction information that is 
used for predicting conditional branches. Because the branch 
history table does not store predicted target addresses, special 
address ALUs calculate target addresses on-the-fly during 
instruction decode. The branch target cache augments 
predicted branch performance by avoiding a one clock 
cache-fetch penalty. This specialized target cache does this by 
supplying the first 16 bytes of target instructions to the 
decoders when branches are predicted. The return address 
stack is a unique device specifically designed for optimizing 
CALL and RETURN pairs. In summary, the AMD-K6-2 
processor uses dynamic branch logic to minimize delays due to 
the branch instructions that are common in x86 software. 


3DNow!™ Technology. AMD has taken a lead role in improving the 
multimedia and 3D capabilities of the x86 processor family with 
the introduction of 3DNow! technology, which uses a packed, 
single-precision, floating-point data format and Single 
Instruction Multiple Data (SIMD) operations based on the 
MMxX technology model. 


2.3 Cache, Instruction Prefetch, and Predecode Bits 


Cache 


The writeback level-one cache on the AMD-K6-2 processor is 
organized as a separate 32-Kbyte instruction cache anda 
32-Kbyte data cache with two-way set associativity. The cache 
line size is 32 bytes and lines are prefetched from main memory 
using an efficient pipelined burst transaction. As the 
instruction cache is filled, each instruction byte is analyzed for 
instruction boundaries using predecoding logic. Predecoding 
annotates information (5 bits per byte) to each instruction byte 
that later enables the decoders to efficiently decode multiple 
instructions simultaneously. 


The processor cache design takes advantage of a sectored 
organization (see Figure 2 on page 10). Each sector consists of 
64 bytes configured as two 32-byte cache lines. The two cache 
lines of a sector share a common tag but have separate pairs of 
MESI (Modified, Exclusive, Shared, Invalid) bits that track the 
state of each cache line. 
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Prefetching 


Predecode Bits 


Two forms of cache misses and associated cache fills can take 
place—a tag-miss cache fill and a tag-hit cache fill. In the case 
of a tag-miss cache fill, the miss is due to a tag mismatch, in 
which case the required cache line is filled from external 
memory, and the cache line within the sector that was not 
required is marked as invalid. In the case of a tag-hit cache fill, 
the address matches the tag, but the requested cache line is 
marked as invalid. The required cache line is filled from 
external memory, and the cache line within the sector that is 
not required remains in the same cache state. 


The AMD-K6-2 processor conditionally performs cache 
prefetching which results in the filling of the required cache 
line first, and a prefetch of the second cache line making up the 
other half of the sector. From the perspective of the external 
bus, the two cache-line fills typically appear as two 32-byte 
burst read cycles occurring back-to-back or, if allowed, as 
pipelined cycles. 


The 3DNow! technology includes an instruction called 
PREFETCH that allows a cache line to be prefetched into the 
data cache. The PREFETCH instruction format is defined in 
Table 17, “3DNow!™ Instructions,” on page 81. For more 
detailed information, see the 3DNow!™ Technology Manual, 
order# 21928. 


Decoding x86 instructions is particularly difficult because the 
instructions are variable-length and can be from 1 to 15 bytes 
long. Predecode logic supplies the five predecode bits that are 
associated with each instruction byte. The predecode bits 
indicate the number of bytes to the start of the next x86 
instruction. The predecode bits are stored in an extended 
instruction cache alongside each x86 instruction byte as shown 
in Figure 2. The predecode bits are passed with the instruction 
bytes to the decoders where they assist with parallel x86 
instruction decoding. 








Tag 
Address 


Cache Line 0 


Byte 31 | Predecode Bits | Byte 30 | Predecode Bits | u....... | oso Byte 0 | Predecode Bits | MESI Bits 








Cache Line 1 





Byte 31 | Predecode Bits | Byte 30 | Predecode Bits | u....... | ose Byte 0 | Predecode Bits | MESI Bits 
































Figure 2. Cache Sector Organization 
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2.4 Instruction Fetch and Decode 


Instruction Fetch 






32-Kbyte Level-One 
Instruction Cache 







Branch Target 
Address Adders 
Return Address Stack 
16 x 16 Bytes 


The processor can fetch up to 16 bytes per clock out of the 
instruction cache or branch target cache. The fetched 
information is placed into a 16-byte instruction buffer that 
feeds directly into the decoders (see Figure 3). Fetching can 
occur along a single execution stream with up to seven 
outstanding branches taken. 


The instruction fetch logic is capable of retrieving any 16 
contiguous bytes of information within a 32-byte boundary. 
There is no additional penalty when the 16 bytes of instructions 
lie across a cache line boundary. The instruction bytes are 
loaded into the instruction buffer as they are consumed by the 
decoders. Although instructions can be consumed with byte 
granularity, the instruction buffer is managed ona 
memory-aligned word (two bytes) organization. Therefore, 
instructions are loaded and replaced with word granularity. 
When a control transfer occurs—such as a JMP instruction— 
the entire instruction buffer is flushed and reloaded with a new 
set of 16 instruction bytes. 


Branch-Target Cache 


16 Bytes 16 x 16 Bytes 





16 Bytes 




























16 Instruction Bytes 
plus 
16 Sets of Predecode Bits 


Instruction Buffer 


Figure 3. The Instruction Buffer 
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Instruction Decode 


The AMD-K6-2 processor decode logic is designed to decode 
multiple x86 instructions per clock (see Figure 4). The decode 
logic accepts x86 instruction bytes and their predecode bits 
from the instruction buffer, locates the actual instruction 
boundaries, and generates RISC86 operations from these x86 
instructions. 


RISC86 operations are fixed-length internal instructions. Most 
RISC86 operations execute in a single clock. RISC86 operations 
are combined to perform every function of the x86 instruction 
set. Some x86 instructions are decoded into as few as zero 
RISC86 operations—for instance a NOP—or one RISC86 
operation—a register-to-register add. More complex x86 
instructions are decoded into several RISC86 operations. 





Instruction Buffer 





On-Chip ROM 


















RISC86® Sequencer 
























> Short Decoder #1 











Short Decoder #2 




















Long Decoder 


» Vector Decoder 







































































Vector Address 4 RISC86 Operations 








Figure 4. AMD-K6®-2 Processor Decode Logic 
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The AMD-K6-2 processor uses a combination of decoders to 
convert x86 instructions into RISC86 operations. The hardware 
consists of three sets of decoders—two parallel short decoders, 
one long decoder, and one vector decoder. The two parallel 
short decoders translate the most commonly-used x86 
instructions (moves, shifts, branches, ALU, FPU) and the 
extensions to the x86 instruction set (including MMX and 
3DNow! instructions) into zero, one, or two RISC86 operations 
each. The short decoders only operate on x86 instructions that 
are up to seven bytes long. In addition, they are designed to 
decode up to two x86 instructions per clock. The 
commonly-used x86 instructions that are greater than seven 
bytes but not more than 11 bytes long, and semi-commonly-used 
x86 instructions that are up to seven bytes long are handled by 
the long decoder. 


The long decoder only performs one decode per clock and 
generates up to four RISC86 operations. All other translations 
(complex instructions, serializing conditions, interrupts and 
exceptions, etc.) are handled by a combination of the vector 
decoder and RISC86 operation sequences fetched from an 
on-chip ROM. For complex operations, the vector decoder logic 
provides the first set of RISC86 operations and a vector (initial 
ROM address) to a sequence of further RISC86 operations. The 
same types of RISC86 operations are fetched from the ROM as 
those that are generated by the hardware decoders. 


Note: Although all three sets of decoders are simultaneously fed a 
copy of the instruction buffer contents, only one of the three 
types of decoders is used during any one decode clock. 


The decoders or the on-chip RISC86 ROM always generate a 
group of four RISC86 operations. For decodes that cannot fill the 
entire group with four RISC86 operations, RISC86 NOP 
operations are placed in the empty locations of the grouping. For 
example, a long-decoded x86 instruction that converts to only 
three RISC86 operations is padded with a single RISC86 NOP 
operation and then passed to the scheduler. Up to six groups or 
24 RISC86 operations can be placed in the scheduler at a time. 


All of the common, and a few of the uncommon, floating-point 
instructions (also known as ESC instructions) are hardware 
decoded as short decodes. This decode generates a RISC86 
floating-point operation and, optionally, an associated 
floating-point load or store operation. Floating-point or ESC 
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instruction decode is only allowed in the first short decoder, but 
non-ESC instructions can be decoded simultaneously by the 
second short decoder along with an ESC instruction decode in 
the first short decoder. 


All of the MMX and 3DNow! instructions, with the exception of 
the EMMS, FEMMS, and PREFETCH instructions, are 
hardware decoded as short decodes. The MMX instruction 
decode generates a RISC86 MMX operation and, optionally, an 
associated MMX load or store operation. A 3DNow! instruction 
decode generates a RISC86 3DNow! operation and, optionally, 
an associated load or store operation. MMX and 3DNow! 
instructions can be decoded in either or both of the short 
decoders. 


2.5 Centralized Scheduler 


The scheduler is the heart of the AMD-K6-2 processor (see 
Figure 5 on page 15). It contains the logic necessary to manage 
out-of-order execution, data forwarding, register renaming, 
simultaneous issue and retirement of multiple RISC86 
operations, and speculative execution. The scheduler’s buffer 
can hold up to 24 RISC86 operations. This equates to a maximum 
of 12 x86 instructions. The scheduler can issue RISC86 
operations from any of the 24 locations in the buffer. When 
possible, the scheduler can simultaneously issue a RISC86 
operation to any available execution unit (store, load, branch, 
register X integer/multimedia, register Y integer/multimedia, or 
floating-point). In total, the scheduler can issue up to six and 
retire up to four RISC86 operations per clock. 


The main advantage of the scheduler and its operation buffer is 
the ability to examine an x86 instruction window equal to 12 
x86 instructions at one time. This advantage is due to the fact 
that the scheduler operates on the RISC86 operations in 
parallel and allows the AMD-K6-2 processor to perform 
dynamic on-the-fly instruction code scheduling for optimized 
execution. Although the scheduler can issue RISC86 operations 
for out-of-order execution, it always retires x86 instructions in 
order. 
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Figure 5. AMD-K6®-2 Processor Scheduler 


2.6 Execution Units 


The AMD-K6-2 processor contains ten parallel execution 
units—store, load, integer X ALU, integer Y ALU, MMX ALU 
(X), MMX ALU (Y), MMX/3DNow! multiplier, 3DNow! ALU, 
floating-point, and branch condition. Each unit is independent 
and capable of handling the RISC86 operations. Table 1 on 
page 16 details the execution units, functions performed within 
these units, operation latency, and operation throughput. 


The store and load execution units are two-stage pipelined 
designs. The store unit performs data writes and register 
calculation for LEA/PUSH. Data memory and register writes 
from stores are available after one clock. Store operations are 
held in a store queue prior to execution. From there, they 
execute in order. The load unit performs data memory reads. 
Data is available from the load unit after two clocks. 
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The Integer X execution unit can operate on all ALU 
operations, multiplies, divides (signed and unsigned), shifts, 
and rotates. 


The Integer Y execution unit can operate on the basic word and 
doubleword ALU operations—ADD, AND, CMP, OR, SUB, 
XOR, zero-extend and sign-extend operands. 


Table 1. Execution Latency and Throughput of Execution Units 















































Functional Unit Function Latency | Throughput 
ines LEA/PUSH, Address (Pipelined) 1 1 
Memory Store (Pipelined) 1 1 
Load Memory Loads (Pipelined) 2 1 
Integer ALU 1 1 
Integer X Integer Multiply 2-3 2-3 
Integer Shift 1 1 
Multimedia MMX ALU | | 
(processes MMxX Shifts, Packs, Unpack 1 1 
MMX instructions) | yyy Multiply 2 1 
Integer Y Basic ALU (16-bit and 32-bit operands) 1 1 
Branch Resolves Branch Conditions 1 1 
FPU FADD, FSUB, FMUL 2 2 
3DNow! ALU 2 1 
3DNow! 3DNow! Multiply 2 1 
3DNow! Convert 2 1 




















The functional units that execute MMX and 3DNow! 
instructions share pipeline control with the Integer X and 
Integer Y units. 


The register X and Y functional units are attached to the issue 
bus for the register X execution pipeline or the issue bus for the 
register Y execution pipeline or both. Each register pipeline 
has dedicated resources that consist of an integer execution 
unit and an MMX ALU execution unit, therefore allowing 
superscalar operation on integer and MMX instructions. In 
addition, both the X and Y issue buses are connected to the 
3DNow! ALU, the MMX/3DNow! multiplier and MMX shifter, 
which allows the appropriate RISC86 operation to be issued 
through either bus. Figure 6 on page 17 shows the details of the 
X and Y register pipelines. 
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Figure 6. Register X and Y Functional Units 


The branch condition unit is separate from the branch 
prediction logic in that it resolves conditional branches such as 
JCC and LOOP after the branch condition has been evaluated. 


2.7 Branch-Prediction Logic 


Sophisticated branch logic that can minimize or hide the impact 
of changes in program flow is designed into the AMD-K6-2 
processor. Branches in x86 code fit into two categories— 
unconditional branches, which always change program flow (that 
is, the branches are always taken) and conditional branches, 
which may or may not divert program flow (that is, the branches 
are taken or not-taken). When a conditional branch is not taken, 
the processor simply continues decoding and executing the next 
instructions in memory. 


Typical applications have up to 10% of unconditional branches 
and another 10% to 20% conditional branches. The AMD-K6-2 
processor branch logic has been designed to handle this type of 
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Branch History Table 


Branch Target Cache 


Return Address Stack 


program behavior and its negative effects on instruction 
execution, such as stalls due to delayed instruction fetching and 
the draining of the processor pipeline. The branch logic 
contains an 8192-entry branch history table, a 16-entry by 
16-byte branch target cache, a 16-entry return address stack, 
and a branch execution unit. 


The AMD-K6-2 processor handles unconditional branches 
without any penalty by redirecting instruction fetching to the 
target address of the unconditional branch. However, 
conditional branches require the use of the dynamic 
branch-prediction mechanism built into the AMD-K6-2 
processor. A two-level adaptive history algorithm is 
implemented in an 8192-entry branch history table. This table 
stores executed branch information, predicts individual 
branches, and predicts the behavior of groups of branches. To 
accommodate the large branch history table, the AMD-K6-2 
processor does not store predicted target addresses. Instead, 
the branch target addresses are calculated on-the-fly using 
ALUs during the decode stage. The adders calculate all 
possible target addresses before the instructions are fully 
decoded and the processor chooses which addresses are valid. 


To avoid a one clock cache-fetch penalty when a branch is 
predicted taken, a built-in branch target cache supplies the first 
16 bytes of instructions directly to the instruction buffer 
(assuming the target address hits this cache). (See Figure 3 on 
page 11.) The branch target cache is organized as 16 entries of 
16 bytes. In total, the branch prediction logic achieves branch 
prediction rates greater than 95%. 


The return address stack is a special device designed to 
optimize CALL and RET pairs. Software is typically compiled 
with subroutines that are frequently called from various places 
in a program. This is usually done to save space. Entry into the 
subroutine occurs with the execution of a CALL instruction. At 
that time, the processor pushes the address of the next 
instruction in memory following the CALL instruction onto the 
stack (allocated space in memory). When the processor 
encounters a RET instruction (within or at the end of the 
subroutine), the branch logic pops the address from the stack 
and begins fetching from that location. To avoid the latency of 
main memory accesses during CALL and RET operations, the 
return address stack caches the pushed addresses. 
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The branch execution unit enables efficient speculative 
execution. This unit gives the processor the ability to execute 
instructions beyond conditional branches before knowing 
whether the branch prediction was correct. The AMD-K6-2 
processor does not permanently update the x86 registers or 
memory locations until all speculatively executed conditional 
branch instructions are resolved. When a prediction is 
incorrect, the processor backs out to the point of the 
mispredicted branch instruction and restores all registers. The 
AMD-K6-2 processor can support up to seven outstanding 
branches. 





Chapter 2 


Internal Architecture 19 


AMDdZ1 Preliminary Information 





AMD-K6®-2 Processor Data Sheet 21850)/0—February 2000 





20 Internal Architecture Chapter 2 


Preliminary Information AMD 





21850)/0—February 2000 


3 


AMD-K6®-2 Processor Data Sheet 


Software Environment 





3.1 


This chapter provides a general overview of the AMD-K6-2 
processor’s x86 software environment and briefly describes the 
data types, registers, operating modes, interrupts, and 
instructions supported by the AMD-K6-2 architecture and 
design implementation. 


The stepping of the Model 8 determines the implementation 
and format of five Model-Specific Registers (MSRs). This 
document covers the following two stepping ranges of the 
AMD-K6-2 processor: 


= Model 8/[7:0] is any of eight possible model/steppings— 
Models 8/0, 8/1, 8/2, 8/3, 8/4, 8/5, 8/6, or 8/7. Model 8/[7:0] 
implements seven MSRs, and the bits and fields within these 
seven MSRs are defined identically. 


= Model 8/[F:8] is any of eight possible model/steppings— 
Models 8/8, 8/9, 8/A, 8/B, 8/C, 8/D, 8/E, or 8/F. Model 8/[F:8] 
implements the same seven MSRs as the Model 8/[7:0], but 
the bits and fields within two of these MSRs are not defined 
identically. Also, Model 8/[F:8] supports three additional 
MSRs for a total of ten MSRs. 


The name AMD-K6-2 processor by itself refers to all steppings of 
the Model 8. See “AMD-K6~-2 Processor Model 8/[F:8] 
Registers” on page 50 for the MSRs that are implemented only 
on the Model 8/[F:8]. 


Registers 


The AMD-K6-2 processor contains all the registers defined by 
the x86 architecture, including general-purpose, segment, 
floating-point, MMX/3DNow!, EFLAGS, control, task, debug, 
test, and descriptor/memory-management registers. In 
addition, this chapter provides information on the AMD-K6-2 
processor MSRs. 


Note: Areas of the register designated as Reserved should not be 
modified by software. 
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General-Purpose The eight 32-bit x86 general-purpose registers are used to hold 
Registers integer data or memory pointers used by instructions. Table 2 


contains a list of the general-purpose registers and the 
functions for which they are used. 


Table 2. General-Purpose Registers 





Register Function 
EAX | Commonly used as an accumulator 





EBX | Commonly used as a pointer 





ECX | Commonly used for counting in loop operations 





EDX | Commonly used to hold I/O information and to pass parameters 





EDI | Commonly used as a destination pointer by the ES segment 





ESI | Commonly used as a source pointer by the DS segment 
ESP | Used to point to the stack segment 
EBP | Used to point to data within the stack segment 




















In order to support byte and word operations, EAX, EBX, ECX, 
and EDX can also be used as 8-bit and 16-bit registers. The 
shorter registers are overlaid on the longer ones. For example, 
the name of the 16-bit version of EAX is AX (low 16 bits of 
EAX) and the 8-bit names for AX are AH (high order bits) and 
AL (low order bits). The same naming convention applies to 
EBX, ECX, and EDX. EDI, ESI, ESP, and EBP can be used as 
smaller 16-bit registers called DI, SI, SP, and BP respectively, 
but these registers do not have 8-bit versions. Figure 7 shows the 
EAX register with its name components, and Table 3 lists the 
doubleword (32-bit) general-purpose registers and their 
corresponding word (16-bit) and byte (8-bit) versions. 


16 15 8.7 0 


emg eax "= 


AX i 
—_ AX ——> <—__ A. ——> 


Figure 7. EAX Register with 16-Bit and 8-Bit Name Components 
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Table 3. General-Purpose Register Doubleword, Word, and Byte Names 












































32-Bit Name 16-Bit Name 8-Bit Name 8-Bit Name 
(Doubleword) (Word) (High-order Bits) | (Low-order Bits) 

EAX AX AH AL 

EBX BX BH BL 

ECX of CH CL 

EDX DX DH DL 

EDI Di - = 

ESI Sl - - 

ESP SP = = 

EBP BP - - 

Integer Data Types Four types of data are used in general-purpose registers—byte, 


word, doubleword, and quadword integers. Figure 8 shows the 
format of the integer data registers. 


Byte Integer 


Precision — 
8 Bits 





Word Integer 


Precision — 16 Bits 








Doubleword Integer 


Precision — 32 Bits 








Quadword Integer 
63 0 


Precision — 64 Bits 





Figure 8. Integer Data Registers 
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Segment Registers 


Segment Usage 


The six 16-bit segment registers are used as pointers to areas 
(segments) of memory. Table 4 lists the segment registers and 
their functions. Figure 9 shows the format for all six segment 
registers. 


Table 4. Segment Registers 





Segment 


Register Segment Register Function 


cs Code segment, where instructions are located 





DS Data segment, where data is located 





ES Data segment, where data is located 





FS Data segment, where data is located 





GS Data segment, where data is located 














ss Stack segment 





Figure 9. Segment Register 


The operating system determines the type of memory model 
that is implemented. The segment register usage is determined 
by the operating system’s memory model. In a Real mode 
memory model the segment register points to the base address 
in memory. In a Protected mode memory model the segment 
register is called a selector and it selects a segment descriptor 
in a descriptor table. This descriptor contains a pointer to the 
base of the segment, the limit of the segment, and various 
protection attributes. For more information on descriptor 
formats, see “Descriptors and Gates” on page 46. Figure 10 on 
page 25 shows segment usage for Real mode and Protected 
mode memory models. 
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Real Mode Memory Model 
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Physical Memory 
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Protected Mode Memory Model 


Figure 10. Segment Usage 


Instruction Pointer 


Floating-Point 
Registers 


The instruction pointer (EIP or IP) is used in conjunction with 
the code segment register (CS). The instruction pointer is 
either a 32-bit register (EIP) or a 16-bit register (IP) that keeps 
track of where the next instruction resides within memory. This 
register cannot be directly manipulated, but can be altered by 
modifying return pointers when a JMP or CALL instruction is 
used. 


The floating-point execution unit in the AMD-K6-2 processor is 
designed to perform mathematical operations on non-integer 
numbers. This floating-point unit conforms to the IEEE 754 and 
854 standards and uses several registers to meet these 
standards—eight numeric floating-point registers, a status 
word register, a control word register, and a tag word register. 
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The eight floating-point registers are physically 80 bits wide 
and labeled FPRO-FPR7. Figure 11 shows the format of these 
floating-point registers. See “Floating-Point Register Data 
Types” on page 28 for information on allowable floating-point 
data types. 

79 78 64 63 0 


Significand 





Figure 11. Floating-Point Register 


The 16-bit FPU status word register contains information about 
the state of the floating-point unit. Figure 12 shows the format 
of this register. 


15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 «0 





Description Bits 


FPU Busy 15 
Condition Code 14 
Top of Stack Pointer 13-11 
Condition Code 10 
Condition Code 9 
Condition Code 8 
Error Summary Status 7 
Stack Fault 6 


Exception Flags 





Precision Error 5 
Underflow Error 4 
Overflow Error 3 
Zero Divide Error 2 
Denormalized Operation Error 1 
Invalid Operation Error 0 


TOSP Information 


000 = FPRO 
111 =FPR7 


Figure 12. FPU Status Word Register 
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The FPU control word register allows a programmer to manage 
the FPU processing options. Figure 13 shows the format of this 























register. 
Reserved 
Symbol Description Bits 
Y Infinity Bit (80287 compatibility) 12 
RC Rounding Control 11-10 
PC Precision Control 9-8 
Exception Masks 
PM Precision 5 
UM Underflow 4 
OM Overflow 3 
ZM Zero Divide 2 
DM Denormalized Operation 1 
IM Invalid Operation 0 
Rounding Control Information Precision Control Information 
00b = Round to the nearest or even number 0b = 24 bits Single Precision Real 
01b = Round down toward negative infinity 01b = Reserved 
10b = Round up toward positive infinity 10b = 53 bits Double Precision Real 
11b = Truncate toward zero 11b = 64 bits Extended Precision Real 


Figure 13. FPU Control Word Register 


The FPU tag word register contains information about the 
registers in the register stack. Figure 14 shows the format of this 
3 


register. 
15 141 1211 109 87 65 43 21 0 
TAG TAG TAG TAG TAG TAG TAG TAG 
(FPR7) | (FPR6) | (FPR5) | (FPR4) | (FPR3) | (FPR2) | (FPR1) | (FPRO) 
Tag Values 


00 = Valid 
01 = Zero 
10 = Special 
11 = Empty 





Figure 14. FPU Tag Word Register 
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Floating-Point Floating-point registers use four different types of data— 
Register Data Types packed decimal, single-precision real, double-precision real, 


and extended-precision real. Figures 15 and 16 show the 
formats for these registers. 


79 78 727) 0 


Ignore 
or Precision — 18 Digits, 72 Bits Used, 4-Bits/Digit 


Zero 


| Description Bits 
Ignored on Load, Zeros on Store 78-72 
Sign Bit 79 


Figure 15. Packed Decimal Data Register 








Single-Precision Real 0 








Double-Precision Real 63 62 50 5] 


Biased 


Exponent Significand 





S=Sign Bit 





Extended-Precision Real 
79 78 64 63 62 0 


Biased 


Exponent Significand 


—- S=Sign Bit = | = Integer Bit 


Figure 16. Precision Real Data Registers 
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MMX™/3DNow!™ The AMD-K6-2 processor implements eight 64-bit 
Registers MMX/3DNovw! registers for use by multimedia software. These 


registers are mapped on the floating-point register stack. The 
MMxX and 3DNow! instructions refer to these registers as mm0 
to mm7. Figure 17 shows the format of these registers. For more 
information, see the AMD-K6® Processor Multimedia Technology 
Manual, order# 20726 and the 3DNow!™ Technology Manual, 
order# 21928. 



































Figure 17. MMX™/3DNow!™ Registers 


MMX™ Data Types For the MMX instructions, the MMX registers use three types of 
data—packed eight-byte integer, packed quadword integer, and 
packed dual doubleword integer. Figure 18 on page 30 shows 
the format of these data types. 
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Packed Bytes Integer 
56 55 48 47 40 39 32 31 24 23 





a Words Integer 
48 47 32 31 16 15 0 





Packed Doubleword Integer 


63 32. 31 0 


Doubleword 1 Doubleword 0 





Figure 18. MMX™ Data Types 


3DNow!™ DataTypes For 3DNow! instructions, the MMX/3DNow! registers use 
packed single-precision real data. Figure 19 shows the format of 
the 3DNow! data type. 


Packed Single Precision Floating Point 


6362 5554 32 3130 2322 0 


Biased 


Exponent Significand Significand 





S=Sign Bit S=Sign Bit 


Figure 19. 3DNow!™ Data Types 
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EFLAGS Register The EFLAGS register provides for three different types of 


flags—system, control, and status. The system flags provide 
operating system controls, the control flag provides directional 
information for string operations, and the status flags provide 
information resulting from logical and arithmetic operations. 
Figure 20 shows the format of this register. 


31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 43 2 1 ~«0 




















Reserved 

Symbol Description Bits 
ID ID Flag 21 
VIP Virtual Interrupt Pending 20 
VIF Virtual Interrupt Flag 19 
AC Alignment Check 18 
VM Virtual-8086 Mode 17 
RF Resume Flag 16 
NT Nested Task 14 
IOPL /0 Privilege Level 13-12 
OF Overflow Flag 11 
DF Direction Flag 10 
IF Interrupt Flag 9 
TF Trap Flag 8 
SF Sign Flag 7 
ZF Zero Flag 6 
AF Auxiliary Flag 4 
PF Parity Flag 2 
CF Carry Flag 0 


Figure 20. EFLAGS Registers 
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Control Registers The five control registers contain system control bits and 
pointers. Figures 21 through 25 show the formats of these 
registers. 

















Reserved 

Symbol Description Bit 
MCE Machine Check Enable 6 
PSE Page Size Extensions 4 
DE Debugging Extensions 3 
TSD Time Stamp Disable 2 
PVI Protected Virtual Interrupts 1 
VME Virtual-8086 Mode Extensions 0 


Figure 21. Control Register 4 (CR4) 





31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 1110 9 8 7 6 5 4 3 2 «1 ~«0 


Page Directory Base 

















Reserved 

Symbol Description Bit 
PCD Page Cache Disable 4 
PWT Page Writethrough 3 


Figure 22. Control Register 3 (CR3) 





Page Fault Linear Address 





Figure 23. Control Register 2 (CR2) 
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3] 0 


Reserved 


Figure 24. Control Register 1 (CR1) 





Symbol Description Bit 





PG Paging 3] 


CD Cache Disable 30 
| == -—— + 7 NW Not Writethrough 29 


31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 1110 9 8 7 6 5 43 2 «1 ~«0 




















Reserved 

Symbol Description Bit 
AM Alignment Mask 18 
we Write Protect 16 
NE Numeric Error 5 
ET Extension Type 4 
TS Task Switched 3 
EM Emulation 2 
MP Monitor Co-processor 1 
PE Protection Enabled 0 


Figure 25. Control Register 0 (CRO) 
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Debug Registers 


31 30 29 28 27 26 25 24 23 22 21 20 


























21850)/0—February 2000 


Figures 26 through 29 show the 32-bit debug registers 
supported by the processor. 























Symbol Description 
LEN 3 Length of Breakpo 
RW 3 Type of Transaction 
LEN 2 Length of Breakpo 
R/W 2 Type of Transaction 
LEN 1 Length of Breakpo 
R/W 1 Type of Transaction 

EN 0 Length of Breakpo 
RW 0 Type of Transaction 











int #3 
(s) to Trap 
int #2 
(s) to Trap 
int #1 
(s) to Trap 
int #0 








(s) to Trap 


Bits 
31-30 
29-28 
27-26 
25-24 
23-22 
21-20 
19-18 
17-16 


19 18 17 16 15 14 13 12 1110 9 8 7 6 5 4 3 2 


] 


0 








Symbol Description Bit 

GD General Detect Enabled 13 
GE Global Exact Breakpoint Enabled 9 
LE Local Exact Breakpoint Enabled 8 
G3 Global Exact Breakpoint #3 Enabled 7 
L3 Local Exact Breakpoint # 3 Enabled 6 
@2 Global Exact Breakpoint #2 Enabled 5 
L2 Local Exact Breakpoint # 2 Enabled 4 
Gl Global Exact Breakpoint #1 Enabled 3 
LI Local Exact Breakpoint # 1 Enabled 2 
GO Global Exact Breakpoint #0 Enabled 1 
LO Local Exact Breakpoint # 0 Enabled 0 

Figure 26. Debug Register DR7 
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31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 «0 


Reserved 

Symbol Description Bit 
BT Breakpoint Task Switch 15 
BS Breakpoint Single Step 14 
BD Breakpoint Debug Access Detected 13 
B3 Breakpoint #3 Condition Detected 
B2 Breakpoint #2 Condition Detected 2 
Bl Breakpoint #1 Condition Detected 1 
BO Breakpoint #0 Condition Detected 0 


Figure 27. Debug Register DR6 


A090 
Nw 
WN Ww 
Nw 
—-w 

















WGN 











DR5 
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 1110 9 8 7 6 5 43 2 1 «0 


SS a 


DR4 
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 1110 9 8 7 6 5 4 3 2 1 ~«0 


Reserved 


Figure 28. Debug Registers DR5 and DR4 
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DR3 
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 1110 9 8 7 6 5 43 2 1 «0 


Breakpoint 3 32-bit Linear Address 





DR2 
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 1110 9 8 7 6 5 43 2 1 «0 


Breakpoint 2 32-bit Linear Address 





DRI 
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 43 2 1 «0 


Breakpoint 1 32-bit Linear Address 





DRO 
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 1110 9 8 7 6 5 43 2 1 «0 


Breakpoint 0 32-bit Linear Address 





Figure 29. Debug Registers DR3, DR2, DR1, and DRO 
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63 
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The AMD-K6-2 processor Model 8/[7:0] provides seven MSRs. 
The value in the ECX register selects the MSR to be addressed 
by the RDMSR and WRMSR instructions. The values in EAX 
and EDX are used as inputs and outputs by the RDMSR and 
WRMSR instructions. Table 5 lists the MSRs and the 
corresponding value of the ECX register. Figures 30 through 36 
show the MSR formats. 


Table 5. AMD-K6®-2 Processor Model 8/[7:0] MSRs 
































Model-Specific Register Value of ECX 
Machine Check Address Register (MCAR) 00h 
Machine Check Type Register (MCTR) Oth 
Test Register 12 (TR12) OEh 
Time Stamp Counter (TSC) 10h 
Extended Feature Enable Register (EFER) C000_0080h 
SYSCALL/SYSRET Target Address Register (STAR) | CO000_0081h 
Write Handling Control Register (WHCR) C000_0082h 





For more information about the RDMSR and WRMSR 
instructions, see the AMD K86™ Family BIOS and Software Tools 
Development Guide, order# 21062. 


MCAR and MCTR. The AMD-K6-2 processor does not support the 
generation of a machine check exception. However, the 
processor does provide a 64-bit machine check address register 
(MCAR), a 64-bit machine check type register (MCTR), anda 
machine check enable (MCE) bit in CR4. Because the processor 
does not support machine check exceptions, the contents of the 
MCAR and MCTR are only affected by the WRMSR instruction 
and by RESET being sampled asserted (where all bits in each 
register are reset to 0). 


0 


MCAR 


Figure 30. Machine-Check Address Register (MCAR) 
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63 5 4 0 
ee 
Reserved 











Figure 31. Machine-Check Type Register (MCTR) 


Test Register 12 (TR12). Test register 12 provides a method for 
disabling the L1 caches. Figure 32 shows the format of TR12. 


63 43 2 1 0 


C 
| 
Symbol Description Bit | 
Reserved Cl Cache Inhibit Bit 3 


Figure 32. Test Register 12 (TR12) 

















Time Stamp Counter. With each processor clock cycle, the 
processor increments the 64-bit time stamp counter (TSC) MSR. 
Figure 33 shows the format of the TSC. 


63 


oO 


TSC 


Figure 33. Time Stamp Counter (TSC) 
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63 


AMD-K6®-2 Processor Data Sheet 


Extended Feature Enable Register (EFER)-Model 8[7:0]. The extended 
feature enable register (EFER) contains the control bits that 
enable the extended features of the AMD-K6-2 processor. 
Figure 34 shows the format of the EFER register, and Table 6 
defines the function of each bit in the EFER register. 


Note: The EFER register as defined in the Model 8/[7:0] has 
changed in the Model 8/[F:8]. See “Extended Feature Enable 
Register (EFER)-—Model 8/[F:8]” on page 50. 


1 0 


S 
E 


Symbol 





Description Bit | 
SCE System Call/Return Extension 0 


Figure 34. Extended Feature Enable Register (EFER)-Model 8[7:0] 


63 


SYSRET CS Selector and SS 


Table 6. Extended Feature Enable Register (EFER)-Model 8[7:0] Definition 














Bit Description R/W 
63-1 | Reserved R 
0 System Call Extension (SCE) R/W 











SYSCALL/SYSRET Target Address Register (STAR). 

The SYSCALL/SYSRET target address register (STAR) 
contains the target EIP address used by the SYSCALL 
instruction and the 16-bit code and stack segment selector 
bases used by the SYSCALL and SYSRET instructions. Figure 
35 shows the format of the STAR register, and Table 7 on 
page 40 defines the function of each bit of the STAR register. 
For more information, see the SYSCALL and SYSRET Instruction 
Specification Application Note, order# 21086. 


48 47 32 31 0 


SYSCALL CS Selector and SS 





Figure 35. SYSCALL/SYSRET Target Address Register (STAR) 
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Table 7. SYSCALL/SYSRET Target Address Register (STAR) Definition 











Bit Description R/W 
63-48 | SYSRET CS and SS Selector Base R/W 
47-32 | SYSCALL CS and SS Selector Base R/W 

31-0 | Target EIP Address R/W 

















Write Handling Control Register (WHCR)-Model 8/[7:0]. 

The write handling control register (WHCR) is a MSR that 
contains three fields—the WCDE bit, write allocate enable 
limit (WAELIM) field, and the write allocate enable 
15-to-16-Mbyte (WAE15M) bit. Figure 36 shows the format of 
WHCR. See “Write Allocate” on page 186 for more information. 


Note: The WHCR register as defined in the Model 8/[7:0] has 
changed in the Model 8/[F:8]. See “Write Handling Control 
Register (WHCR)-—Model 8/[F:8]” on page 51. 

















63 9 8 7 ] 0 
W 
A 
WAELIM E 
5 
M 
Reserved 
Symbol Description Bits 
WCDE Always program to 0 8 
WAELIM — Write Allocate Enable Limit 7-1 
WAE15M_— Write Allocate Enable 15-to-16-Mbyte 0 


Note: Hardware RESET initializes this MSR to all zeros. 


Figure 36. Write Handling Control Register (WHCR)-Model 8/[7:0] 


Memory The AMD-K6-2 processor controls segmented memory 
Management management with the registers listed in Table 8. Figure 37 on 
Registers page 41 shows the formats of these registers. 


Table 8. Memory Management Registers 





Register Name Function 
Global Descriptor Table Register Contains a pointer to the base of the global descriptor table 
Interrupt Descriptor Table Register | Contains a pointer to the base of the interrupt descriptor table 
Local Descriptor Table Register Contains a pointer to the local descriptor table of the current task 
Task Register Contains a pointer to the task state segment of the current task 
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Global and Interrupt Descriptor Table Registers 
47 


32-Bit Linear Base Address 16-Bit Limit 








Local Descriptor Table Register and Task Register 


om 


Selector 


63 32 31 


32-Bit Linear Base Address 32-Bit Limit 





15 


Attributes 


Figure 37. Memory Management Registers 
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Task State Segment Figure 38 shows the format of the task state segment (TSS). 


31 0 


TSS Limit 
hatgent from TR 
/O Permission Bitmap (IOPB) 


(up to 8 Kbytes) 


Interrupt Redirection Bitmap (IRB) 


(eight 32-bit locations) 





Operating System 
Data Structure 


si 








EFLAGS 
i 
a A a 


0000h Link (Prior TSS Selector) 0 


Figure 38. Task State Segment (TSS) 


EDI 
ESI 
EBP 
ESP 
EBX 
EDX 
ECX 
EAX 
EIP 
CR3 
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Page Directory Page Table Page 
Offset Offset Offset 


AMD-K6®-2 Processor Data Sheet 


The AMD-K6-2 processor can physically address up to four 
Gbytes of memory. This memory can be segmented into pages. 
The size of these pages is determined by the operating system 
design and the values set up in the page directory entries (PDE) 
and page table entries (PTE). The processor can access both 
4-Kbyte pages and 4-Mbyte pages, and the page sizes can be 
intermixed within a page directory. When the page size 
extension (PSE) bit in CR4 is set, the processor translates linear 
addresses using either the 4-Kbyte translation lookaside buffer 
(TLB) or the 4-Mbyte TLB, depending on the state of the page 
size (PS) bit in the page directory entry. Figures 39 and 40 show 
how 4-Kbyte and 4-Mbyte page translations work. 


4-Kbyte 
Page Page Page 
Directory Table Frame 






hee | 
hee | 


Linear Address 


Figure 39. 4-Kbyte Paging Mechanism 
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4-Mbyte 
Page 
Frame 






Page 
Directory 


Physical 
Address 


Page Directory 
Offset 


Linear Address 


Figure 40. 4-Mbyte Paging Mechanism 


Figures 41 through 43 show the formats of the PDE and PTE. 
These entries contain information regarding the location of 
pages and their status. 
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31 12 11109 8 7 6 5 43 2 1 «0 


Page Table Base Address 





Symbol Description Bits | 
AVL Available to Software 11-9 
Reserved 8 
PS Page Size 7 
Reserved 6 
A Accessed 5 
PCD Page Cache Disable 4 
PWT Page Writethrough 3 
U/S User/Supervisor 2 
W/R Write/Read 1 
P Present (valid) 0 


Figure 41. Page Directory Entry 4-Kbyte Page Table (PDE) 





31 22 21 12 11109 8 7 6 5 4 3 2 1 ~«0 


Physical Page Base Address 





Symbol Description Bits | 
AVL Available to Software 11-9 
Reserved 8 
PS Page Size 7 
Reserved 6 
A Accessed 5 
PCD Page Cache Disable 4 
PWT Page Writethrough 3 
U/S User/Supervisor 2 
W/R Write/Read 1 
P Present (valid) 0 


Figure 42. Page Directory Entry 4-Mbyte Page Table (PDE) 
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Physical Page Base Address 





Symbol Description Bits | 
AVL Available to Software 11-9 
Reserved 8-7 
D Dirty 6 
A Accessed 5 
PCD Page Cache Disable 4 
PWT Page Writethrough 3 
U/S User/Supervisor 2 
W/R Write/Read 1 
P Present (valid) 0 


Figure 43. Page Table Entry (PTE) 


Descriptors and Gates 


There are various types of structures and registers in the x86 
architecture that define, protect, and isolate code segments, 
data segments, task state segments, and gates. These structures 
are called descriptors. 


Figure 44 on page 47 shows the application segment descriptor 
format. Table 9 contains information describing the memory 
segment type to which the descriptor points. The application 
segment descriptor is used to point to either a data or code 
segment. 


Figure 45 on page 48 shows the system segment descriptor 
format. Table 10 contains information describing the type of 
segment or gate to which the descriptor points. The system 
segment descriptor is used to point to a task state segment, a 
call gate, or a local descriptor table. 


The AMD-K6-2 processor uses gates to transfer control between 
executable segments with different privilege levels. Figure 46 
on page 49 shows the format of the gate descriptor types. Table 
10 contains information describing the type of segment or gate 
to which the descriptor points. 





46 


Software Environment Chapter 3 


Preliminary Information AMD 




















21850)/0—February 2000 AMD-K6®-2 Processor Data Sheet 
Symbol Description Bits 
G Granularity 23 
D 32-Bit/16-Bit 2 
AVL Available to Software 20 
P Present/Valid Bit 15 
Reserved . es 

Z DPL Descriptor Privilege Level 14-13 

DT Descriptor Type 12 
i Type See Table 9 11-8 
31 30 29 28 27 26 25 24 23 22 21 20 17 16 15 14 1110 9 8 7 6 5 4 3 2 «1 «0 


Base Address 31-24 G om [|] Base Address 23-16 
Limit 


Base Address 15-0 Segment Limit 15-0 





Figure 44. Application Segment Descriptor 


Table 9. Application Segment Types 





Type | Data/Code Description 
0 Read-Only 

Read-Only —Accessed 

Read/Write 

Read/Write—Accessed 

Read-Only—Expand-down 








— 

















Data 











Read-Only—Expand-down, Accessed 
Read/Write —-Expand-down 
Read/Write—Expand-down, Accessed 

















Execute-Only 








Execute-Only—Accessed 








Execute/Read 








Execute/Read—Accessed 
Code 








Execute-Only—Conforming 








Execute-Only—Conforming, Accessed 








Execute/Read-Only— Conforming 








TA) m) OLA] Ww] SS] oOo] on], an] om] SB] WN 


Execute/Read-Only— Conforming, Accessed 
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Symbol Description Bits 
G Granularity 23 
X Not Needed 22 
AVL Availability to Software 20 
Reserved P Present/Valid Bit 15 

DPL Descriptor Privilege Level 14-13 
DT Descriptor Type 12 

i Type See Table 10 11-8 

31 30 29 28 27 26 25 24 23 22 21 17 16 15 14 11109 8 7 6 5 4 3 2 1 «0 


Base Address 31-24 GX Doe [TP] Base Address 23-16 
Limit 


Base Address 15-0 Segment Limit 15-0 





Figure 45. System Segment Descriptor 


Table 10. System Segment and Gate Types 





Type Description 
0 | Reserved 

Available 16-bit TSS 

LDT 

Busy 16-bit TSS 

16-bit Call Gate 

Task Gate 

16-bit Interrupt Gate 

16-bit Trap Gate 

Reserved 

Available 32-bit TSS 

Reserved 

Busy 32-bit TSS 

32-bit Call Gate 

Reserved 

32-bit Interrupt Gate 

32-bit Trap Gate 





— 












































TA) mM) OLA] wo] S]|ol orn]; an] ao] B]|WI)N 
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Reserved 











31 30 29 28 27 26 25 24 23 22 21 





Segment Selector 


Figure 46. Gate Descriptor 
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Symbol Description 
P Present/Valid Bit 


DPL 
DT 
| a Type 


19 18 17 16 15 14 13 12 1110 9 8 7 6 5 4 3 2 


Descriptor Type 
See Table 10 


Offset 15-0 


Descriptor Privilege Level 


15 
14-13 
12 
11-8 










































































Exceptions and Table 11 summarizes the exceptions and interrupts. 
Interrupts 
Table 11. Summary of Exceptions and Interrupts 
pipet Interrupt Type Cause 
0 Divide by Zero Error DIV, IDIV 
1 Debug Debug trap or fault 
2 Non-Maskable Interrupt | NMI signal sampled asserted 
3 Breakpoint Int 3 
4 Overflow INTO 
5 Bounds Check BOUND 
6 Invalid Opcode Invalid instruction 
7 Device Not Available ESC and WAIT 
8 Double Fault Fault occurs while handling a fault 
9 Reserved - Interrupt 13 | — 
10 Invalid TSS Task switch to an invalid segment 
1 Segment Not Present _| Instruction loads a segment and present bit is 0 (invalid segment) 
12 Stack Segment Stack operation causes limit violation or present bit is 0 
13 General Protection Segment related or miscellaneous invalid actions 
14 Page Fault Page protection violation or a reference to missing page 
16 Floating-Point Error Arithmetic error generated by floating-point instruction 
7 Alignment Check per ae to an unaligned operand. (The AC flag and the AM bit of CRO are 
0-255 | Software Interrupt INTn 
Chapter 3 Software Environment 49 
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3.2 AMD-K6°-2 Processor Model 8/[F:8] Registers 


Extended Feature 
Enable Register 
(EFER) -Model 8/[F:8] 


AMD-K6-2 processor Model 8/[F:8] implements the same seven 
MSRs as the Model 8/[7:0], but the bits and fields within the 
EFER and WHCR MSRs are not defined identically. Model 
8/[F:8] also supports three additional MSRs: UWCCR, PSOR, 
and PFIR. For more information, see the AMD-K6® Processor 
BIOS Design Application Note, order# 21329. Table 12 lists the 
MSRs and the corresponding value of the ECX register. 


Table 12. AMD-K6®-2 Processor Model 8/[F:8] MSRs 
































Model-Specific Register Value of ECX 
| Machine Check Address Register (MCAR) | 00h | 

Machine Check Type Register (MCTR) Oth 
Test Register 12 (TR12) OEh 
Time Stamp Counter (TSC) 10h 
Extended Feature Enable Register (EFER) C000_0080h 
SYSCALL/SYSRET Target Address Register (STAR) | CO000_0081h 
Write Handling Control Register (WHCR) C000_0082h 
UC/WC Cacheability Control Register (UWCCR) | C000_0085h 
Processor State Observability Register (PSOR) C000_0087h 
Page Flush/Invalidate Register (PFIR) C000_0088h 














The Extended Feature Enable Register (EFER) contains the 
control bits that enable the extended features of the processor. 
Figure 47 shows the format of the EFER register, and Table 13 
on page 51 defines the function of each bit of the EFER 
register. 


Note: The EFER register as defined in the Model 8/[7:0] has 
changed in the Model 8/[F:8]. See “Extended Feature Enable 
Register (EFER) —Model 8[7:0]” on page 39. 
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—» Reserved 
Symbol Description Bit 
EWBEC EWBE Control 3-2 
DPE Data Prefetch Enable ] 
SCE System Call Extension 0 


Figure 47. Extended Feature Enable Register (EFER)— Model 8/[F:8] 


Table 13. Extended Feature Enable Register (EFER) - Model 8/[F:8] Definition 





Bit Description R/W Function 
63-4 Racenied R Writing a 1 to any reserved bit causes a general protection 


fault to occur. All reserved bits are always read as 0. 





This 2-bit field controls the behavior of the processor with 

: respect to the ordering of write cycles and the EWBE# signal. 
EWBE.Contror (EWEEC) R/W | EFER[3) and EFER[2] are Global EWBE Disable (GEWBED) 
and Speculative EWBE Disable (SEWBED), respectively. 


DPE must be set to 1 to enable data prefetching (this is the 
default setting following reset). If enabled, cache misses 
1 Data Prefetch Enable (DPE) R/W | initiated by a memory read within a 32-byte cache line are 
conditionally followed by cache-line fetches of the other line 
in the 64-byte sector. 














0 System Call Extension (SCE) R/W oe - = teh to enable the usage of the SYSCALL and 














For more information on EWBEC, see “EWBE Control” on page 


201. 
Write Handling The Write Handling Control Register (WHCR) is a MSR that 
Control Register contains two fields—the Write Allocate Enable Limit 
(WHCR)-Model (WAELIM) field, and the Write Allocate Enable 15-to-16-Mbyte 
8/[F:8] (WAE15M) bit (see Figure 48). For more information, see 


“Write Allocate” on page 186. 


Note: The WHCR register as defined in the Model 8/[7:0] has 
changed in the Model 8/[F:8]. See “Write Handling Control 
Register (WHCR)-Model 8/[7:0]” on page 40. 
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63 32 31 2221 1716 15 0 
v 


WAELIM 




















—» Reserved 
Symbol Description Bits 
WAELIM Write Allocate Enable Limit 31-22 


WAE15M — Write Allocate Enable 15-to-16-Mbyte 16 


Note: Hardware RESET initializes this MSR to all zeros. 


Figure 48. Write Handling Control Register (WHCR)—Model 8/[F:8] 


UC/WC Cacheability The AMD-K6-2 processor Model 8/[F:8] provides two variable- 
Control Register range Memory Type Range Registers (MTRRs)—MTRRO and 
(UWCCR) MTRR1—that each specify a range of memory. Each range can 
be defined as uncacheable (UC) or write-combining (WC) 
memory. For more detailed information on UWCCR, see 
“UC/WC Cacheability Control Register (UWCCR)” on page 








203. 
Symbol Description Bits Symbol Description Bits 
UCI Uncacheable Memory Type 32 UCO Uncacheable Memory Type 0 

WC1 Write-Combining Memory Type 33 >] | WCO Write-Combining Memory Type 1 | | 
63 49 48 34 33 32 31 17 16 2 lO 
U 
Physical Base Address 1 Physical Address Mask 1 Physical Base Address 0 Physical Address Mask 0} C | C 
0 











a MTRRI a MTRRO > 


Figure 49. UC/WC Cacheability Control Register (UWCCR) 
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Processor State The AMD-K6-2 processor Model 8/[F:8] provides the Processor 
Observability State Observability Register (PSOR) (see Figure 50). 
Register (PSOR) 

63 9 

















—» Reserved 
Symbol Description Bit 
NOL2 No L2 Functionality 8 
STEP Processor Stepping 7-4 
BF Bus Frequency Divisor 20 


Figure 50. Processor State Observability Register (PSOR) 


Page Flush/Invalidate The AMD-K6-2 processor Model 8/[F:8] contains the Page 

Register (PFIR) Flush/Invalidate Register (PFIR) (see Figure 51) that allows 
cache invalidation and optional flushing of a specific 4-Kbyte 
page from the linear address space. For more detailed 
information on PFIR, see “PFIR” on page 195. 

















63 32 31 1211 9 87 1 0 
p F 
LINPAGE F ( 
—-» Reserved 

Symbol Description Bit 

LINPAGE 20-bit Linear Page Address 31-12 

PF Page Fault Occurred 8 

F/I Flush/Invalidate Command 0 


Figure 51. Page Flush/Invalidate Register (PFIR) 
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3.3 Instructions Supported by the AMD-K6°-2 Processor 


This section documents all of the x86 instructions supported by 
the AMD-K6-2 processor. The following tables show the 
instruction mnemonic, opcode, modR/M byte, decode type, and 
RISC86 operation(s) for each instruction. Tables 14 through 17 
define the integer, floating-point, MMX, and 3DNow! 

instructions for the AMD-K6-2 processor, respectively. 


The first column in these tables indicates the instruction 
mnemonic and operand types with the following notations: 


ms reg&8—byte integer register defined by instruction byte(s) or 
bits 5, 4, and 3 of the modR/M byte 


m= mreg&8—byte integer register or byte integer value in 
memory defined by the modR/M byte 


ms reg16/32—word or doubleword integer register defined by 
instruction byte(s) or bits 5, 4, and 3 of the modR/M byte 


ms mreg16/32—word or doubleword integer register, or word or 
doubleword integer value in memory defined by the 
modR/M byte 


mem&— byte integer value in memory 

mem16/32—word or doubleword integer value in memory 
mem32/48—doubleword or 48-bit integer value in memory 
mem48—48-bit integer value in memory 

mem64— 64-bit value in memory 

imm&—8-bit immediate value 

imm16/32—16-bit or 32-bit immediate value 

disp8—8-bit displacement value 

disp16/32—16-bit or 32-bit displacement value 
disp32/48—doubleword or 48-bit displacement value 
eXX—register width depending on the operand size 
mem32real—32-bit floating-point value in memory 
mem64real— 64-bit floating-point value in memory 
memS&0real— 80-bit floating-point value in memory 
mmreg—MMX/3DNow! register 


mmreg1—MMX/3DNow! register defined by bits 5, 4, and 3 
of the modR/M byte 


ms mmreg2—MMX/3DNow! register defined by bits 2, 1, and 0 
of the modR/M byte 
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The second and third columns list all applicable opcode bytes. 


The fourth column lists the modR/M byte when used by the 
instruction. The modR/M byte defines the instruction as a 
register or memory form. If modR/M bits 7 and 6 are documented 


as mm (memory form), mm can only be 10b, 01b or OOb. 


The fifth column lists the type of instruction decode— short, 
long, and vector. The AMD-K6-2 processor decode logic can 


process two short, one long, or one vector decode per clock. 


The sixth column lists the type of RISC86 operation(s) required 
for the instruction. The operation types and corresponding 
execution units are as follows: 


Table 14. Integer Instructions 


load, fload, mload—load unit 
store, fstore, mstore—store unit 
alu—either of the integer execution units 


alux—integer X execution unit only 


branch—branch condition unit 


float—floating-point execution unit 


meu—Multimedia execution units for MMX and 3DNow! 
instructions 


limm—load immediate, instruction control unit 




































































iactruction: Mnemouié First | Second ModR/M | Decode RISC86 
Byte Byte Byte Type Operations 

AAA 37h vector 

AAD D5h OAh vector 

AAM D4h OAh vector 

AAS 3Fh vector 

ADC mregg, reg8 10h 11-XXX-XXX vector 

ADC mem8, reg8 10h MM-XXX-xXxx | vector 

ADC mreg16/32, reg16/32 11h 11-XXX-XXX vector 

ADC mem16/32, reg16/32 1th MmM-xxx-xxx | vector 

ADC reg8, mreg8 12h 11-XXX-XXX vector 

ADC reg8, mem’ 12h MM-XXx-xXxx | vector 

ADC reg16/32, mreg16/32 13h 11-XXX-XXX vector 

ADC reg16/32, mem16/32 13h MM-Xxx-Xxx | vector 

ADC AL, imms 14h vector 
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Instruction Mnemonic First | Second ModR/M | Decode RISC86 
Byte Byte Byte Type Operations 
ADC EAX, imm16/32 15h vector 
ADC mregg, imm8s 80h 11-010-xxx vector 
ADC memé, imm8s 80h mm-010-xxx | __ vector 
ADC mreg16/32, imm16/32 81h 11-010-xxx vector 
ADC mem16/32, imm16/32 8ih mm-010-xxx |__ vector 
ADC mreg16/32, immé (signed ext.) 83h 11-010-xxx | vector 
ADC mem16/32, immé8 (signed ext.) 83h mm-010-xxx | vector 
ADD mregs, reg8 00h 11-XXX-XXX short | alux 
ADD memé, reg8 00h MIM-XXX-XXX long _| load, alux, store 
ADD mreg16/32, reg16/32 1h 11-XXX-XXX short | alu 
ADD mem16/32, reg16/32 Oth IMIM-XXX-XXX long —_| load, alu, store 
ADD reg8, mreg8 02h 11-XXX-XXX short | alux 
ADD reg8, mem8 02h MIM-XXX-XXX short | load, alux 
ADD reg16/32, mreg16/32 03h 11-XXX-XXX short | alu 
ADD reg16/32, mem16/32 03h MIM -XXX-XXX short | load, alu 
ADD AL, imms 04h short | alux 
ADD EAX, imm16/32 05h short | alu 
ADD mregg, imm8s 80h 11-000-xxx short | alux 
ADD mems, imms 80h mm-000-xxx long load, alux, store 
ADD mreg16/32, imm16/32 81h 11-000-xxx short | alu 
ADD mem16/32, imm16/32 81h mm-000-xxx | long | load, alu, store 
ADD mreg16/32, immé (signed ext.) 83h 11-000-xxx short | alux 
ADD mem16/32, immés (signed ext.) 83h mm-000-xxx long load, alux, store 
AND mreg8, reg8 20h 11 -XXX-XXX short | alux 
AND mem§, reg8 20h IMIM-XXX-XXX long _| load, alux, store 
AND mreg16/32, reg16/32 21h 11 -XXX-XXX short | alu 
AND mem16/32, reg16/32 21h IMIM-XXX-XXX long _| load, alu, store 
AND reg8, mreg8 22h 11-XXX-XXX short | alux 
AND reg8, mem8 22h MIM-XXX-XXX short _| load, alux 
AND reg16/32, mreg16/32 23h 11 -XXX-XXX short | alu 
AND reg16/32, mem16/32 23h MIM-XXX-XXX short | load, alu 
AND AL, imms 24h short | alux 
AND EAX, imm16/32 25h short | alu 
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Instruction Mnemonic First | Second ModR/M | Decode RISC86 

Byte Byte Byte Type Operations 

AND mregg, imm8s 80h 11-100-xxx short | alux 

AND memé, imms 80h mm-100-xxx long load, alux, store 

AND mreg16/32, imm16/32 81h 11-100-xxx short | alu 

AND mem16/32, imm16/32 81h mm-100-Xxxx long _| load, alu, store 

AND mreg16/32, immé (signed ext.) 83h 11-100-Xxxx short | alux 

AND mem16/32, immé (signed ext.) 83h mm-100-xxx long load, alux, store 

ARPL mreg16, reg16 63h 11 -XXX-XXX vector 

ARPL mem16, reg16 63h MM-XXX-XxXX | vector 

BOUND 62h vector 

BSF reg16/32, mreg16/32 OFh BCh 11 -XXX-XXX vector 

BSF reg16/32, mem16/32 OFh BCh Mm-Xxx-xxx | vector 

BSR reg16/32, mreg16/32 OFh BDh 11 -XXX-XXX vector 

BSR reg16/32, mem16/32 OFh BDh MM-Xxx-xXxx | vector 

BSWAP EAX OFh C8h long | alu 

BSWAP ECX OFh C9h long | alu 

BSWAP EDX OFh CAh long | alu 

BSWAP EBX OFh CBh long | alu 

BSWAP ESP OFh CCh long | alu 

BSWAP EBP OFh CDh long | alu 

BSWAP ESI OFh CEh long | alu 

BSWAP EDI OFh CFh long | alu 

BT mreg16/32, reg16/32 OFh A3h 11-XXX-XXX vector 

BT mem16/32, reg16/32 OFh A3h MM-XXX-Xxx | vector 

BT mreg16/32, imm8s OFh BAh 11-100-xxx vector 

BT mem16/32, imm8 OFh BAh mm-100-xxx | vector 

BIC mreg16/32, reg16/32 OFh BBh 11-XXX-XXX vector 

BIC mem16/32, reg16/32 OFh BBh MM-Xxx-xxx | vector 

BIC mreg16/32, imm8s OFh BAh 11-111-xxx vector 

BIC mem16/32, imm8s OFh BAh mm-111-xxx | vector 

BTR mreg16/32, reg16/32 OFh B3h 11-XXX-XXX vector 

BIR mem16/32, reg16/32 OFh B3h MM-XXX-Xxx | vector 

BIR mreg16/32, imm8s OFh BAh 11-110-xxx vector 

BIR mem16/32, imm8 OFh BAh mm-110-xxx | vector 
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instruction Mnemonic First | Second ModR/M | Decode RISC86 
Byte Byte Byte Type Operations 
BTS mreg16/32, reg16/32 OFh ABh 11 -XXX-XXX vector 
BTS mem16/32, reg16/32 OFh ABh Mm-Xxx-xxx | vector 
BTS mreg16/32, imm8 OFh BAh 11-101-xxx | vector 
BTS mem16/32, imms OFh BAh mm-101-xxx | vector 
CALL full pointer 9Ah vector 
CALL near imm16/32 E8h short _| store 
CALL mem16:16/32 FFh 11-011-xxx vector 
CALL near mreg32 (indirect) FFh 11-010-xxx | vector 
CALL near mem32 (indirect) FFh mm-010-xxx | vector 
CBW/CWDE EAX 98h vector 
CLC F8h vector 
CLD FCh vector 
CLI FAh vector 
CLTS OFh 06h vector 
CMC F5h vector 
CMP mregg, reg8 38h 11 -XXX-XXX short | alux 
CMP mem, reg8 38h MIM-XXX-XXX short | load, alux 
CMP mreg16/32, reg16/32 39h 11 -XXX-XXX short | alu 
CMP mem16/32, reg16/32 39h mm-xxx-xxx | short | load, alu 
CMP reg8, mreg8 3Ah 11 -XXX-XXX short | alux 
CMP reg8, mem8 3Ah MIM-XXX-XXX short | load, alux 
CMP reg16/32, mreg16/32 3Bh 11 -XXX-XXX short | alu 
CMP reg16/32, mem16/32 3Bh mm-Xxxx-xxx | short | load, alu 
CMP AL, imms 3Ch short | alux 
CMP EAX, imm16/32 3Dh short | alu 
CMP mregg, imms 80h 11-111-xxx short | alux 
CMP memés, imms 80h mm-111-xxx short | load, alux 
CMP mreg16/32, imm16/32 81h 11-111-Xxxx short | alu 
CMP mem16/32, imm16/32 8ih mm-111-xxx short | load, alu 
CMP mreg16/32, immés (signed ext.) 83h 11-111-xxx long —_| load, alu 
CMP mem16/32, immé (signed ext.) 83h mm-111-xxx long —_| load, alu 
CMPSB mems, mem8 Aé6h vector 
CMPSW mem16, mem32 A7h vector 
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instruction Mnemonic First | Second ModR/M | Decode RISC86 
Byte Byte Byte Type Operations 
CMPSD mem32, mem32 A7h vector 
CMPXCHG mreg8, reg8 OFh Boh 11 -XXX-XXX vector 
CMPXCHG memé, reg8 OFh Boh MM-XXX-Xxx | vector 
CMPXCHG mreg16/32, reg16/32 OFh Bih 11 -XXX-XXX vector 
CMPXCHG mem16/32, reg16/32 OFh Bih MM-XXX-Xxx | vector 
CMPXCHG8B EDX:EAX OFh C7h 11-XXX-XXX vector 
CMPXCHG8B mem64 OFh C7h MM-Xxx-xXxx | vector 
CPUID OFh A2h vector 
CWD/CDQ EDX, EAX 99h vector 
DAA 27h vector 
DAS 2Fh vector 
DEC EAX 48h short | alu 
DEC ECX 49h short | alu 
DEC EDX 4Ah short | alu 
DEC EBX 4Bh short | alu 
DEC ESP 4Ch short | alu 
DEC EBP 4Dh short | alu 
DEC ESI 4Eh short | alu 
DEC EDI 4Fh short | alu 
DEC mreg8 FEh 11-001-xxx | vector 
DEC mems8 FEh mm-001 -xxx long load, alux, store 
DEC mreg16/32 FFh 11-001-xxx | vector 
DEC mem 16/32 FFh mm-001-xxx | long | load, alu, store 
DIV AL, mreg8 Féh 11-110-xxx vector 
DIV AL, mems Féh mm-110-xxx | vector 
DIV EAX, mreg16/32 F7h 11-110-xxx vector 
DIV EAX, mem16/32 F7h mm-110-xxx | vector 
IDIV mreg8 Féh 11-111-xxx | vector 
IDIV mem8 Féh mm-111-xxx | vector 
IDIV EAX, mreg16/32 F7h 11-111-xxx vector 
IDIV EAX, mem16/32 F7h mm-111-xxx | vector 
IMUL reg16/32, imm16/32 69h 11 -XXX-XXX vector 
IMUL reg16/32, mreg16/32, imm16/32 | 69h 11-XXX-XXX vector 
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instruction Mnemonic First | Second ModR/M | Decode RISC86 
Byte Byte Byte Type Operations 
IMUL reg16/32, mem16/32, imm16/32 | 69h MM-XXX-Xxx | vector 
IMUL reg16/32, immé (sign extended) | 6Bh 11 -XXX-XXX vector 
ay mreg16/32, imm8 6Bh heer weetor 
aa mem16/32, imm8s 6Bh ams wel Aeere 
IMUL AX, AL, mreg8 Féh 11-101 -xxx vector 
IMUL AX, AL, mem8 Féh mm-101-xxx | vector 
IMUL EDX:EAX, EAX, mreg16/32 F7h 11-101-xxx vector 
IMUL EDX:EAX, EAX, mem16/32 F7h mm-101-xxx | vector 
IMUL reg16/32, mreg16/32 OFh AFh 11 -XXX-XXX vector 
IMUL reg16/32, mem16/32 OFh AFh MM-XXX-Xxx | vector 
IN AL, imms E4h vector 
IN AX, imms E5h vector 
IN EAX, imms E5h vector 
IN AL, DX ECh vector 
IN AX, DX EDh vector 
IN EAX, DX EDh vector 
INC EAX 40h short | alu 
INC ECX 4th short | alu 
INC EDX 42h short | alu 
INC EBX 43h short | alu 
INC ESP 44h short | alu 
INC EBP 45h short | alu 
INC ESI 46h short | alu 
INC EDI 47h short | alu 
INC mreg8 FEh 11-000-xxx | vector 
INC mems8 FEh mm-000-xxx long load, alux, store 
INC mreg16/32 FFh 11-000-xxx | vector 
INC mem16/32 FFh mm-000-xxx | long | load, alu, store 
INVD OFh 08h vector 
INVLPG OFh Oth mm-111-xxx | vector 
JO short disp8 70h short | branch 
JB/JNAE short disp8 71h short | branch 
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instruction Mnemonic First | Second ModR/M | Decode RISC86 
Byte Byte Byte Type Operations 

JNO short disp8 71h short | branch 
JNB/JAE short disp8 73h short | branch 
JZ/JE short disp8 74h short | branch 
JNZ/JNE short disp8 75h short | branch 
JBE/JNA short disp8 76h short | branch 
JNBE/JA short disp8 77h short | branch 
JS short disp8 78h short | branch 
JNS short disp8 79h short | branch 
JP/JPE short disp8 7Ah short | branch 
JNP/JPO short disp8 7Bh short | branch 
JL/JNGE short disp8 7Ch short | branch 
JNL/JGE short disp8 7Dh short | branch 
JLE/JNG short disp8 7Eh short | branch 
JNLE/JG short disp8 7Fh short | branch 
JCXZ/JEC short disp8 E3h vector 

JO near disp16/32 OFh 80h short | branch 
JNO near disp16/32 OFh 81h short | branch 
JB/JNAE near disp16/32 OFh 82h short | branch 
JNB/JAE near disp16/32 OFh 83h short | branch 
JZ/JE near disp16/32 OFh 84h short | branch 
JNZ/JNE near disp16/32 OFh 85h short | branch 
JBE/JNA near disp 16/32 OFh 86h short | branch 
JNBE/JA near disp16/32 OFh 87h short | branch 
JS near disp16/32 OFh 88h short | branch 
JNS near disp16/32 OFh 89h short | branch 
JP/JPE near disp16/32 OFh 8Ah short | branch 
JNP/JPO near disp16/32 OFh 8Bh short | branch 
JL/JNGE near disp16/32 OFh 8Ch short | branch 
JNL/JGE near disp16/32 OFh 8Dh short | branch 
JLE/JNG near disp16/32 OFh 8Eh short | branch 
JNLE/JG near disp16/32 OFh 8Fh short | branch 
JMP near disp16/32 (direct) E9h short | branch 
JMP far disp32/48 (direct) EAh vector 
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instruction Mnemonic First | Second ModR/M | Decode RISC86 
Byte Byte Byte Type Operations 
JMP disp8 (short) EBh short | branch 
JMP far mreg32 (indirect) EFh 11-101-xxx | vector 
JMP far mem32 (indirect) EFh mm-101-xxx | vector 
JMP near mreg16/32 (indirect) FFh 11-100-xxx | vector 
JMP near mem16/32 (indirect) FFh mm-100-xxx | vector 
LAHF 9Fh vector 
LAR reg16/32, mreg16/32 OFh 02h 11 -XXX-XXX vector 
LAR reg16/32, mem16/32 OFh 02h MM-XXx-Xxx | vector 
LDS reg16/32, mem32/48 C5h MmM-XxxX-xxx | vector 
LEA reg16/32, mem16/32 8Dh MIM-XXX-XXX short | load, alu 
LEAVE C9h long load, alu, alu 
LES reg16/32, mem32/48 C4h MM-XxX-xxx | vector 
LFS reg16/32, mem32/48 OFh B4h vector 
LGDT mem48 OFh Oth mm-010-xxx | vector 
LGS reg16/32, mem32/48 OFh B5h vector 
LIDT mem48 OFh 1h mm-011-xxx | vector 
LLDT mregi6 OFh 00h 11-010-xxx | vector 
LLDT mem16 OFh 00h mm-010-xxx | vector 
LMSW mreg16 OFh Oth 11-100-xxx | vector 
LMSW mem16 OFh Oth mm-100-xxx | vector 
LODSB AL, mem8 ACh long load, alu 
LODSW AX, mem16 ADh long load, alu 
LODSD EAX, mem32 ADh long load, alu 
LOOP disp8 E2h short | alu, branch 
LOOPE/LOOPZ disp8 Eth vector 
LOOPNE/LOOPNZ disp8 EOh vector 
LSL reg16/32, mreg16/32 OFh 03h 11 -XXX-XXX vector 
LSL reg16/32, mem16/32 OFh 03h MmM-Xxx-xxx | vector 
LSS reg16/32, mem32/48 OFh B2h MM-XxXx-xxx | vector 
LTR mreg16 OFh 00h 11-011-xxx | vector 
LTR mem16 OFh 00h mm-011-xxx | vector 
MOV mregg, reg8 88h 11-XXX-XXX short | alux 
MOV mem, reg8 88h mm-xxx-xxx | short | store 
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Instruction Mnemonic First | Second ModR/M | Decode RISC86 
Byte Byte Byte Type Operations 

MOV mreg16/32, reg16/32 89h 11 -XXX-XXX short | alu 
MOV mem16/32, reg16/32 89h Mmm-xxx-xxx | short | store 
MOV reg8, mreg8 8Ah 11 -XXX-XXX short | alux 
MOV reg8, mem8 8Ah Mm-Xxx-xxx | short | load 
MOV reg16/32, mreg16/32 8Bh 11 -XXX-XXX short | alu 
MOV regi6/32, mem16/32 8Bh Mm-Xxx-xxx | short | load 
MOV mreg16, segment reg 8Ch 11 -XXX-XXX long | load 
MOV mem16, segment reg 8Ch MM-XXX-XXX | vector 
MOV segment reg, mreg16 8Eh 11 -XXX-XXX vector 
MOV segment reg, mem16 8Eh MIM-XXX-Xxx | vector 
MOV AL, mems Aoh short | load 
MOV EAX, mem16/32 Ath short | load 
MOV memés, AL A2h short | store 
MOV mem16/32, EAX A3h short | store 
MOV AL, imms Boh short | limm 
MOV CL, imms Bih short | limm 
MOV DL, imms B2h short | limm 
MOV BL, imms B3h short | limm 
MOV AH, imms B4h short | limm 
MOV CH, imms B5h short | limm 
MOV DH, imms B6h short | limm 
MOV BH, imms B7h short | limm 
MOV EAX, imm 16/32 B8h short | limm 
MOV ECX, imm16/32 B9h short | limm 
MOV EDX, imm16/32 BAh short | limm 
MOV EBX, imm16/32 BBh short | limm 
MOV ESP, imm16/32 BCh short | limm 
MOV EBP, imm16/32 BDh short | limm 
MOV ESI, imm16/32 BEh short | limm 
MOV EDI, imm16/32 BFh short | limm 
MOV mregg, imm8 C6h 11-000-xxx short | limm 
MOV memés, imms Cé6h mm-000-xxx long store 
MOV mreg16/32, imm16/32 C7h 11-000-xxx short | limm 
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Instruction Mnemonic ie ne rr si i jecaane 
MOV mem16/32, imm16/32 C7h mm-000-xxx long |store 
MOV reg32, CRO OFh 20h 11-000-xxx | vector 
MOV reg32, CR2 OFh 20h 11-010-xxx | vector 
MOV reg32, CR3 OFh 20h 11-011-xxx | vector 
MOV reg32, CR4 OFh 20h 11-100-xxx | vector 
MOV CRO, reg32 OFh 22h 11-000-xxx | vector 
MOV CR2, reg32 OFh 22h 11-010-xxx | vector 
MOV CR3, reg32 OFh 22h 11-011-xxx | vector 
MOV CR4, reg32 OFh 22h 11-100-xxx | vector 
MOVSB mem8,mems A4h long load, store, alux, alux 
MOVSD mem16, mem16 A5h long load, store, alu, alu 
MOVSW mem32, mem32 A5h long load, store, alu, alu 
MOVSX reg16/32, mreg8 OFh BEh 11 -XXX-XXX short | alu 
MOVSX reg16/32, mem8 OFh BEh MIM-XXX-XXX short _| load, alu 
MOVSX reg32, mreg16 OFh BFh 11 -XXX-XXX short | alu 
MOVSX reg32, mem16 OFh BFh MIM-XXX-XXX short | load, alu 
MOVZX reg16/32, mreg8 OFh B6h 11 -XXX-XXX short | alu 
MOVZX reg16/32, mem8 OFh B6h MIM-XXX-XXX short —_| load, alu 
MOVZX reg32, mreg16 OFh B7h 11 -XXX-XXX short | alu 
MOVZX reg32, mem16 OFh B7h MIM-XXX-XXX short | load, alu 
MUL AL, mreg8 Féh 11-100-xxx | vector 
MUL AL, mem8 Féh mm-100-xxx | vector 
MUL EAX, mreg16/32 F7h 11-100-xxx | vector 
MUL EAX, mem16/32 F7h mm-100-xxx | vector 
NEG mreg8 F6h 11-011-Xxxx short | alux 
NEG mem8s Féh mm-011-xxx | vector 
NEG mreg16/32 F7h 11-011-Xxxx short | alu 
NEG mem16/32 F7h mm-011-xxx | vector 
NOP (XCHG EAX, EAX) 90h short | limm 
NOT mreg8 Féh 11-010-xxx short | alux 
NOT mem’ Féh mm-010-xxx | vector 
NOT mreg16/32 F7h 11-010-Xxxx short | alu 
NOT mem16/32 F7h mm-010-xxx | vector 
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instruction Mnemonic First | Second ModR/M | Decode RISC86 
Byte Byte Byte Type Operations 
OR mreg8, reg8 08h 11-XXX-XXX short | alux 
OR mem, reg8 08h IMIM-XXX-XXX long _| load, alux, store 
OR mreg16/32, reg16/32 09h 11-XXX-XXX short | alu 
OR mem16/32, reg16/32 09h IMM-XXX-XXX long —_| load, alu, store 
OR reg8, mreg8 OAh 11-XXX-XXX short | alux 
OR reg8, mem8 OAh MIM-XXX-XXX short | load, alux 
OR reg16/32, mreg16/32 OBh 11 -XXX-XXX short | alu 
OR reg16/32, mem16/32 OBh MIM-XXX-XXX short _| load, alu 
OR AL, imms 0Ch short | alux 
OR EAX, imm16/32 ODh short | alu 
OR mreg8, imm8s 80h 11-001 -xxx short | alux 
OR memé, imms 80h mm-001-xxx long load, alux, store 
OR mreg16/32, imm16/32 81h 11-001-xxx short | alu 
OR mem16/32, imm16/32 81h mm-001-xxx long _| load, alu, store 
OR mreg16/32, immés (signed ext.) 83h 11-001-xxx short | alux 
OR mem16/32, immé8 (signed ext.) 83h mm-001-xxx long _| load, alux, store 
OUT imms, AL E6h vector 
OUT imms, AX E7h vector 
OUT imms, EAX E7h vector 
OUT DX, AL EEh vector 
OUT DX, AX EFh vector 
OUT DX, EAX EFh vector 
POP ES 07h vector 
POP SS 17h vector 
POP DS 1Fh vector 
POP FS OFh Ath vector 
POP GS OFh AQh vector 
POP EAX 58h short | load, alu 
POP ECX 59h short | load, alu 
POP EDX 5Ah short _| load, alu 
POP EBX 5Bh short | load, alu 
POP ESP 5Ch short | load, alu 
POP EBP 5Dh short | load, alu 





























Chapter 3 Software Environment 65 


AMDdZ1 Preliminary Information 





AMD-K6®-2 Processor Data Sheet 21850)/0—February 2000 


Table 14. Integer Instructions (continued) 


























































































































Instruction Mnemonic ie ne ere i oe joe ae 
POP ESI 5Eh short | load, alu 
POP EDI 5Fh short _| load, alu 
POP mreg 16/32 8Fh 11-000-xxx short | load, alu 
POP mem 16/32 8Fh mm-000-xxx long load, store, alu 
POPA/POPAD 61h vector 
POPF/POPFD 9Dh vector 
PUSH ES 06h long load, store 
PUSH CS OEh vector 
PUSH FS OFh Aoh vector 
PUSH GS OFh Ash vector 
PUSH SS 16h vector 
PUSH DS 1Eh long load, store 
PUSH EAX 50h short | store 
PUSH ECX 51h short | store 
PUSH EDX 52h short | store 
PUSH EBX 53h short | store 
PUSH ESP 54h short | store 
PUSH EBP 55h short | store 
PUSH ESI 56h short | store 
PUSH EDI 57h short | store 
PUSH imm8s 6Ah long | store 
PUSH imm16/32 68h long _| store 
PUSH mreg16/32 FFh 11-110-xxx | vector 
PUSH mem16/32 FFh mm-110-xxx long load, store 
PUSHA/PUSHAD 60h vector 
PUSHF/PUSHFD 9Ch vector 
RCL mregg, imms COh 11-010-xxx vector 
RCL mems, imms Coh mm-010-xxx | vector 
RCL mreg16/32, imm8s Cih 11-010-xxx | vector 
RCL mem16/32, imms Cih mm-010-xxx | vector 
RCL mreg8, 1 Doh 11-010-xxx | vector 
RCL memé, 1 Doh mm-010-xxx | vector 
RCL mreg16/32, 1 Dih 11-010-xxx | vector 
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instruction Mnemonic First | Second ModR/M | Decode RISC86 
Byte Byte Byte Type Operations 
RCL mem16/32, 1 Dih mm-010-xxx | vector 
RCL mregs, CL D2h 11-010-xxx | vector 
RCL memé, CL D2h mm-010-xxx | vector 
RCL mreg16/32, CL D3h 11-010-xxx | vector 
RCL mem16/32, CL D3h mm-010-xxx | vector 
RCR mreg8, imms Coh 11-011-xxx vector 
RCR memés, imms Coh mm-011-xxx | vector 
RCR mreg16/32, imms Cih 11-011-xxx vector 
RCR mem16/32, imms Cih mm-011-xxx | _ vector 
RCR mregg, 1 Doh 11-011-xxx | vector 
RCR memé, 1 Doh mm-011-xxx | vector 
RCR mreg16/32, 1 Dih 11-011-xxx | vector 
RCR mem16/32, 1 Dih mm-011-xxx | vector 
RCR mreg8, CL D2h 11-011-xxx | vector 
RCR mems, CL D2h mm-011-xxx | vector 
RCR mreg16/32, CL D3h 11-011-xxx | vector 
RCR mem16/32, CL D3h mm-011-xxx | vector 
RDMSR OFh 32h vector 
RDTSC OFh 31h vector 
RET near imm16 C2h vector 
RET near C3h vector 
RET far imm16 CAh vector 
RET far CBh vector 
ROL mregs, imm8 COh 11-000-xxx | vector 
ROL memé8, imm8s Coh mm-000-xxx | vector 
ROL mreg16/32, imm8 Cih 11-000-xxx vector 
ROL mem16/32, imm8s Cih mm-000-xxx |__ vector 
ROL mregg, 1 Doh 11-000-xxx | vector 
ROL memé, 1 Doh mm-000-xxx |_ vector 
ROL mreg16/32, 1 Dih 11-000-xxx | vector 
ROL mem16/32, 1 Dih mm-000-xxx |__ vector 
ROL mregg, CL D2h 11-000-xxx | vector 
ROL mem, CL D2h mm-000-xxx |__ vector 
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instruction Mnemonic First | Second ModR/M | Decode RISC86 
Byte Byte Byte Type Operations 

ROL mreg16/32, CL D3h 11-000-xxx | vector 

ROL mem16/32, CL D3h mm-000-xxx |__ vector 

ROR mreg8, imm8s COh 11-001-xxx | vector 

ROR memé, imms Coh mm-001-xxx | vector 

ROR mreg16/32, imm8s Cth 11-001-xxx | vector 

ROR mem16/32, imm8 Cih mm-001-xxx | vector 

ROR mregg, 1 Doh 11-001-xxx | vector 

ROR memé, 1 Doh mm-001-xxx | vector 

ROR mreg16/32, 1 Dih 11-001 -xxx vector 

ROR mem16/32, 1 Dih mm-001-xxx | vector 

ROR mregg, CL D2h 11-001-xxx | vector 

ROR mems, CL D2h mm-001-xxx | vector 

ROR mreg16/32, CL D3h 11-001-xxx | vector 

ROR mem16/32, CL D3h mm-001-xxx | vector 

RSM OFh AAh vector 

SAHF 9Eh vector 

SAR mregg, imms COh 11-111-xxx short | alux 
SAR memé, imm8 COh mm-111-xxx | vector 

SAR mreg16/32, imms Cih 11-111-xxx short | alu 
SAR mem16/32, imm8s Cih mm-111-xxx | vector 

SAR mregg, 1 Doh 11-111-xxx short | alux 
SAR mem8, 1 Doh mm-111-xxx | vector 

SAR mreg16/32, 1 Dih 11-111-Xxxx short | alu 
SAR mem16/32, 1 Dih mm-111-xxx | vector 

SAR mreg8, CL D2h 11-111-Xxxx short | alux 
SAR mems, CL D2h mm-111-xxx | vector 

SAR mreg16/32, CL D3h 11-111-Xxxx short | alu 
SAR mem 16/32, CL D3h mm-111-xxx | vector 

SBB mreg8, reg8 18h 11-XXX-XXX vector 

SBB memé, reg8 18h MM-Xxx-Xxx | vector 

SBB mreg16/32, reg16/32 19h 11 -XXX-XXX vector 

SBB mem16/32, reg 16/32 19h MmM-Xxx-Xxx | vector 

SBB reg8, mreg8 1Ah 11-XXX-XXX vector 
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Instruction Mnemonic First | Second ModR/M | Decode RISC86 
Byte Byte Byte Type Operations 
SBB reg8, mem8 1Ah MM-Xxx-Xxx | vector 
SBB reg16/32, mreg 16/32 1Bh 11-XXX-XXX vector 
SBB reg16/32, mem16/32 1Bh MM-Xxx-Xxx | vector 
SBB AL, imm8s 1Ch vector 
SBB EAX, imm16/32 1Dh vector 
SBB mregg8, imm8s 80h 11-011 -xxx vector 
SBB memés, imms 80h mm-011-xxx | vector 
SBB mreg16/32, imm16/32 81h 11-011-xxx vector 
SBB mem16/32, imm16/32 8th mm-011-xxx | vector 
SBB mreg16/32, immé (signed ext.) 83h 11-011-xxx | vector 
SBB mem16/32, immé (signed ext.) 83h mm-011-xxx | vector 
SCASB AL, mems AEh vector 
SCASW AX, mem16 AFh vector 
SCASD EAX, mem32 AFh vector 
SETO mreg8 OFh 90h 11 -XXX-XXX vector 
SETO mem8 OFh 90h MIM-XXX-Xxx | vector 
SETNO mreg8 OFh 9th 11 -XXX-XXX vector 
SETNO mem’ OFh 91h MmM-Xxx-xxx | vector 
SETB/SETNAE mreg8 OFh 92h 11 -XXX-XXX vector 
SETB/SETNAE mems OFh 92h MM-XXx-Xxx | vector 
SETNB/SETAE mreg8 OFh 93h 11 -XXX-XXX vector 
SETNB/SETAE mem8 OFh 93h MM-XXx-xxx | vector 
SETZ/SETE mreg8 OFh 94h 11 -XXX-XXX vector 
SETZ/SETE mem8 OFh 94h MM-Xxx-xXxx | vector 
SETNZ/SETNE mreg8 OFh 95h 11 -XXX-XXX vector 
SETNZ/SETNE mem8 OFh 95h Mm-Xxx-xxx | vector 
SETBE/SETNA mreg8 OFh 96h 11 -XXX-XXX vector 
SETBE/SETNA mem8s OFh 96h MmM-Xxx-xxx | vector 
SETNBE/SETA mreg8 OFh 97h 11 -XXX-XXX vector 
SETNBE/SETA mem8 OFh 97h MM-Xxx-xXxx | vector 
SETS mreg8 OFh 98h 11 -XXX-XXX vector 
SETS mem’ OFh 98h MM-XXX-Xxx | vector 
SETNS mreg8 OFh 99h 11 -XXX-XXX vector 
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Instruction Mnemonic ie co bet i pin Seca 

SETNS mem8 OFh 99h MM-Xxx-xXxx | vector 
SETP/SETPE mreg8 OFh 9Ah 11 -XXX-XXX vector 
SETP/SETPE mem8 OFh 9Ah Mm-Xxx-xxx | vector 
SETNP/SETPO mreg8 OFh 9Bh 11 -XXX-XXX vector 
SETNP/SETPO mems OFh 9Bh MM-XXX-Xxx | vector 
SETL/SETNGE mreg8 OFh 9Ch 11 -XXX-XXX vector 
SETL/SETNGE mems OFh 9Ch MmM-Xxx-xXxx | vector 
SETNL/SETGE mreg8 OFh 9Dh 11 -XXX-XXX vector 
SETNL/SETGE mem8s OFh 9Dh MM-XXx-xxx | vector 
SETLE/SETNG mreg8 OFh 9Eh 11 -XXX-XXX vector 
SETLE/SETNG mems OFh 9Eh MM-Xxx-xxx | vector 
SETNLE/SETG mreg8 OFh 9Fh 11 -XXX-XXX vector 
SETNLE/SETG mem8 OFh 9Fh MM-XXX-Xxx | vector 

SGDT mem48 OFh Oth mm-000-xxx | vector 

SIDT mem48 OFh OIh mm-001-xxx | vector 
SHL/SAL mreg8, imm8s COh 11-100-Xxxx short | alux 
SHL/SAL memé, imms Coh mm-100-xxx | vector 
SHL/SAL mreg16/32, imm8s Cih 11-100-xxx short | alu 
SHL/SAL mem16/32, imm8 Cih mm-100-xxx | _ vector 
SHL/SAL mregg, 1 Doh 11-100-xxx short | alux 
SHL/SAL memé, 1 Doh mm-100-xxx | vector 
SHL/SAL mreg16/32, 1 Dih 11-100-xxx short | alu 
SHL/SAL mem16/32, 1 Dih mm-100-xxx | vector 
SHL/SAL mregs, CL D2h 11-100-xxx short | alux 
SHL/SAL mems, CL D2h mm-100-xxx | vector 
SHL/SAL mreg16/32, CL D3h 11-100-xxx short | alu 
SHL/SAL mem16/32, CL D3h mm-100-xxx | vector 

SHR mregs, imm8 Coh 11-101 -xxx short | alux 
SHR memé, imm8 Coh mm-101-xxx | vector 

SHR mreg16/32, imms Cih 11-101-xxx short | alu 
SHR mem16/32, imms Cih mm-101-xxx | vector 

SHR mregg, 1 Doh 11-101-xxx short | alux 
SHR mem, 1 Doh mm-101-xxx | vector 
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Instruction Mnemonic ie ne ere i oe jecsane 
SHR mreg16/32, 1 Dih 11-101-xxx short | alu 
SHR mem16/32, 1 Dih mm-101-xxx | vector 
SHR mregg, CL D2h 11-101-xxx short | alux 
SHR mems, CL D2h mm-101-xxx | vector 
SHR mreg16/32, CL D3h 11-101-xxx short | alu 
SHR mem16/32, CL D3h mm-101-xxx | vector 
SHLD mreg16/32, reg16/32, imm8s OFh A4h 11 -XXX-XXX vector 
SHLD mem16/32, reg16/32, imm8 OFh A4h MM-XxXx-xxx | vector 
SHLD mreg16/32, reg16/32, CL OFh A5h 11 -XXX-XXX vector 
SHLD mem16/32, reg16/32, CL OFh A5h Mm-Xxx-xxx | vector 
SHRD mreg16/32, reg16/32, imm8s OFh ACh 11 -XXX-XXX vector 
SHRD mem16/32, reg 16/32, imm8s OFh ACh MM-Xxx-xXxx | vector 
SHRD mreg16/32, reg16/32, CL OFh ADh 11 -XXX-XXX vector 
SHRD mem16/32, reg16/32, CL OFh ADh MM-Xxx-xxx | vector 
SLDT mreg16 OFh 00h 11-000-xxx | vector 
SLDT mem16 OFh 00h mm-000-xxx |__ vector 
SMSW mregi6 OFh Oth 11-100-xxx | vector 
SMSW mem16 OFh Oth mm-100-xxx | vector 
STC F9h vector 
STD FDh vector 
STI FBh vector 
STOSB mems, AL AAh long store, alux 
STOSW mem16, AX ABh long store, alux 
STOSD mem32, EAX ABh long store, alux 
STR mreg16 OFh 00h 11-001-xxx | vector 
STR mem16 OFh 00h mm-001-xxx | vector 
SUB mreg8, reg8 28h 11 -XXX-XXX short | alux 
SUB memé, reg8 28h MIM-XXX-XXX long load, alux, store 
SUB mreg16/32, reg16/32 29h 11 -XXX-XXX short | alu 
SUB mem16/32, reg16/32 29h MIM-XXX-XXX long load, alu, store 
SUB reg8, mreg8 2Ah 11 -XXX-XXX short | alux 
SUB reg8, mem8s 2Ah MIM-XXX-XXX short | load, alux 
SUB reg16/32, mreg16/32 2Bh 11 -XXX-XXX short | alu 
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instruction Mnemonic First | Second ModR/M | Decode RISC86 

Byte Byte Byte Type Operations 

SUB reg16/32, mem16/32 2Bh mm-xxx-xxx | short | load, alu 

SUB AL, imm8 2Ch short | alux 

SUB EAX, imm16/32 2Dh short | alu 

SUB mregg, imm8s 80h 11-101-Xxxx short | alux 

SUB mem8, imms 80h mm-101-xxx long load, alux, store 

SUB mreg16/32, imm16/32 81h 11-101 -xxx short | alu 

SUB mem16/32, imm16/32 8ih mm-101-xxx long load, alu, store 

SUB mreg16/32, imm8 (signed ext.) 83h 11-101-Xxxx short | alux 

SUB mem16/32, imm8 (signed ext.) 83h mm-101-xxx long load, alux, store 

SYSCALL OFh 05h vector 

SYSRET OFh 07h vector 

TEST mreg8, reg8 84h 11-XXX-XXX short | alux 

TEST mem, reg8 84h MM-XXX-XxXX | vector 

TEST mreg16/32, reg16/32 85h 11-XXX-XXX short | alu 

TEST mem16/32, reg16/32 85h MM-Xxx-xxx | vector 

TEST AL, imm8 A8h long | alux 

TEST EAX, imm16/32 A9h long | alu 

TEST mreg8, imm8 Féh 11-000-xxx long | alux 

TEST mems, imms Féh mm-000-xxx long load, alux 

TEST mreg16/32, imm16/32 F7h 11-000-xxx long | alu 

TEST mem16/32, imm16/32 F7h mm-000-xxx long load, alu 

VERR mreg16 OFh 00h 11-100-xxx | vector 

VERR mem16 OFh 00h mm-100-xxx | vector 

VERW mregi6 OFh 00h 11-101-xxx | vector 

VERW mem16 OFh 00h mm-101-xxx | vector 

WAIT 9Bh vector 

WBINVD OFh 09h vector 

WRMSR OFh 30h vector 

XADD mregg, reg8 OFh Coh 11-100-xxx | vector 

XADD mem, reg8 OFh Coh mm-100-xxx | vector 

XADD mreg16/32, reg16/32 OFh Cih 11-101-xxx | vector 

XADD mem16/32, reg16/32 OFh Cih mm_-101-xxx | vector 

XCHG reg8, mreg8 86h 11 -XXX-XXX vector 
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instruction Mnemonic First | Second ModR/M | Decode RISC86 
Byte Byte Byte Type Operations 

XCHG reg8, mem8 86h MM-XXX-Xxx | vector 
XCHG reg16/32, mreg16/32 87h 11-XXX-XXX vector 
XCHG reg16/32, mem 16/32 87h MM-XXx-Xxx | vector 
XCHG EAX, EAX 90h short | limm 
XCHG EAX, ECX 9ih long alu, alu, alu 
XCHG EAX, EDX 92h long alu, alu, alu 
XCHG EAX, EBX 93h long alu, alu, alu 
XCHG EAX, ESP 94h long alu, alu, alu 
XCHG EAX, EBP 95h long alu, alu, alu 
XCHG EAX, ESI 96h long alu, alu, alu 
XCHG EAX, EDI 97h long alu, alu, alu 
XLAT D7h vector 
XOR mreg8, reg8 30h 11-XXX-XXX short | alux 
XOR mem8, reg8 30h MIM-XXX-XXX long load, alux, store 
XOR mreg16/32, reg16/32 31h 11 -XXX-XXX short | alu 
XOR mem 16/32, reg16/32 31h IMIM-XXX-XXX long _| load, alu, store 
XOR reg8, mreg8 32h 11-XXX-XXX short | alux 
XOR reg8, mem’ 32h MIM-XXX-XXX short | load, alux 
XOR reg16/32, mreg16/32 33h 11 -XXX-XXX short | alu 
XOR reg16/32, mem16/32 33h mm-xxx-xxx | short | load, alu 
XOR AL, imms 34h short | alux 
XOR EAX, imm16/32 35h short | alu 
XOR mreg8, imm8 80h 11-110-xxx short | alux 
XOR memé8, imms 80h mm-110-xxx long load, alux, store 
XOR mreg16/32, imm16/32 81h 11-110-Xxxx short | alu 
XOR mem 16/32, imm16/32 81h mm-110-xxx long _| load, alu, store 
XOR mreg16/32, immé (signed ext.) 83h 11-110-xxx short | alux 
XOR mem 16/32, imm8 (signed ext.) 83h mm-110-xxx long load, alux, store 
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* The last three bits of the modk/M byte select the stack entry ST(i). 











lactnuction Macinonic First | Second | ModR/M | Decode RISC86 Note 
Byte | Byte Byte Type Operations 

F2XM1 D9h FOh short | float 
FABS D9h Fih short | float 
FADD ST(0), ST(i) Dsh 11-000-xxx | short | float * 
FADD ST(0), mem32real Dsh mm-000-xxx | short | fload, float 
FADD ST(i), ST(0) DCh 11-000-xxx | short | float ‘i 
FADD ST(0), mem64real DCh mm-000-xxx | short | fload, float 
FADDP ST(1), ST(0) DEh 11-000-xxx | short | float + 
FBLD DFh mm-100-xxx | vector 
FBSTP DFh mm-110-xxx | vector 
FCHS D9h EOh short | float 
FCLEX DBh E2h vector 
FCOM ST(0), ST(i) D8h 11-010-xxx | short | float * 
FCOM ST(0), mem32real D8h mm-010-xxx | short | fload, float 
FCOM ST(0), mem64real DCh mm-010-xxx | short | fload, float 
FCOMP ST(0), ST(i) D8h 11-011-xxx | short | float : 
FCOMP ST(0), mem32real D8h mm-011-xxx | short | fload, float 
FCOMP ST(0), mem64real DCh mm-011-xxx | short | fload, float 
FCOMPP DEh D9h 11-011-001 short | float 
FCOS D9h FFh short | float 
FDECSTP D9h Féh short | float 
FDIV ST(0), ST(i) (single precision) Dsh 11-110-xxx | short | float * 
FDIV ST(0), ST(i) (double precision) Dsh 11-110-xxx | short | float ‘a 
FDIV ST(0), ST(i) (extended precision) | D8h 11-110-xxx | short | float 
FDIV ST(i), ST(0) (single precision) DCh 11-111-xxx | short | float % 
FDIV ST(i), ST(0) (double precision) DCh 11-111-xxx | short | float ‘i 
FDIV ST(i), ST(0) (extended precision) | DCh 11-111-xxx | short | float * 
FDIV ST(0), mem32real D8sh mm-110-xxx | short | fload, float 
FDIV ST(0), mem64real DCh mm-110-xxx | short | fload, float 
FDIVP ST(0), ST(i) DEh 11-111-xxx | short | float ‘i 
FDIVR ST(0), ST(i) Dsh 11-110-xxx | short | float : 
FDIVR ST(i), ST(0) DCh 11-111-xxx | short | float i 
Note: 
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* The last three bits of the modk/M byte select the stack entry ST(i). 








lastruction Mnemonic First | Second | ModR/M | Decode RISC86 Note 
Byte | Byte Byte Type Operations 

FDIVR ST(0), mem32real D8h mm-111-xxx | short | fload, float 
FDIVR ST(0), mem64real DCh mm-111-xxx | short | fload, float 
FDIVRP ST(1), ST(0) DEh 11-110-xxx | short | float 
FFREE ST(i) DDh 11-000-xxx | short | float 
FIADD ST(0), mem32int DAh mm-000-xxx | short | fload, float 
FIADD ST(0), mem16int DEh mm-000-xxx | short | fload, float 
FICOM ST(0), mem32int DAh mm-010-xxx | short | fload, float 
FICOM ST(0), mem 16int DEh mm-010-xxx | short | fload, float 
FICOMP ST(0), mem32int DAh mm-011-xxx | short | fload, float 
FICOMP ST(0), mem 16int DEh mm-011-xxx | short | fload, float 
FIDIV ST(0), mem32int DAh mm-110-xxx | short | fload, float 
FIDIV ST(0), mem 16int DEh mm-110-xxx | short | fload, float 
FIDIVR ST(0), mem32int DAh mm-111-xxx | short | fload, float 
FIDIVR ST(0), mem 16int DEh mm-111-xxx | short | fload, float 
FILD mem16int DFh mm-000-xxx | short | fload, float 
FILD mem32int DBh mm-000-xxx | short | fload, float 
FILD mem64int DFh mm-101-xxx | short | fload, float 
FIMUL ST(0), mem32int DAh mm-001-xxx | short | fload, float 
FIMUL ST(0), mem 16int DEh mm-001-xxx | short | fload, float 
FINCSTP D9h F7h short 
FINIT DBh E3h vector 
FIST mem16int DFh mm-010-xxx | short | fload, float 
FIST mem32int DBh mm-010-xxx | short | fload, float 
FISTP mem16int DFh mm-011-xxx | short | fload, float 
FISTP mem32int DBh mm-011-xxx | short | fload, float 
FISTP memé4int DFh mm-111-xxx | short | fload, float 
FISUB ST(0), mem32int DAh mm-100-xxx | short | fload, float 
FISUB ST(0), mem 16int DEh mm-100-xxx | short | fload, float 
FISUBR ST(0), mem32int DAh mm-101-xxx | short | fload, float 
FISUBR ST(0), mem16int DEh mm_-101-xxx | short | fload, float 
FLD ST(i) D9h 11-000-xxx | short | fload, float 
Note: 
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* The last three bits of the modk/M byte select the stack entry ST(i). 











instruction Mnemonic First | Second | ModR/M | Decode RISC86 Note 
Byte | Byte Byte Type Operations 
FLD mem32real D9h mm-000-xxx | short | fload, float 
FLD mem64real DDh mm-000-xxx | short | fload, float 
FLD mem80real DBh mm-101-xxx | vector 
FLD1 D9h E8h short | fload, float 
FLDCW D9h mm-101-xxx | vector 
FLDENV D9h mm-100-xxx | short | fload, float 
FLDL2E D9h EAh short | float 
FLDL2T D9h E9h short | float 
FLDLG2 D9h ECh short | float 
FLDLN2 D9h EDh short | float 
FLDPI D9h EBh short | float 
FLDZ D9h EEh short | float 
FMUL ST(0), ST(i) Dsh 11-001-xxx | short | float _ 
FMUL ST(i), ST(0) DCh 11-001-xxx | short | float z 
FMUL ST(0), mem32real D8h mm-001-xxx | short | fload, float 
FMUL ST(0), mem64real DCh mm-001-xxx | short | fload, float 
FMULP ST(0), ST(1) DEh 11-001-xxx | short | float ig 
FNOP D9h Doh short | float 
FPATAN D9h F3h short | float 
FPREM D9h F8h short | float 
FPREM1 D9h F5h short | float 
FPTAN D9h F2h vector 
FRNDINT D9h FCh short | float 
FRSTOR DDh mm-100-xxx | vector 
FSAVE DDh mm-110-xxx | vector 
FSCALE D9h FDh short | float 
FSIN D9h FEh short | float 
FSINCOS D9h FBh vector 
FSQRT (single precision) D9h FAh short | float 
FSQRT (double precision) D9h FAh short | float 
FSQRT (extended precision) D9h FAh short | float 
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* The last three bits of the modk/M byte select the stack entry ST(i). 








lnstruction Mnemonic First | Second | ModR/M | Decode RISC86 Note 
Byte | Byte Byte Type Operations 

FST mem32real D9h mm-010-xxx | short | fstore 
FST mem64real DDh mm-010-xxx | short | fstore 
FST ST(i) DDh 11-010-xxx | short | fstore 
FSTCW D9h mm-111-xxx | vector 
FSTENV D9h mm-110-xxx | vector 
FSTP mem32real D9h mm-011-xxx | short | fstore 
FSTP mem64real DDh mm-011-xxx | short | fstore 
FSTP mem80real D9h mm-111-xxx | vector 
FSTP ST(i) DDh 11-011-xxx | short | float 
FSTSW AX DFh EOh vector 
FSTSW mem16 DDh mm-111-xxx | vector 
FSUB ST(0), mem32real D8h mm-100-xxx | short | fload, float 
FSUB ST(0), memé64real DCh mm-100-xxx | short | fload, float 
FSUB ST(0), ST(i) D8h 11-100-xxx | short | float 
FSUB ST(i), ST(0) DCh 11-101-xxx | short | float 
FSUBP ST(0), ST(i) DEh 11-101-xxx | short | float 
FSUBR ST(0), mem32real D8h mm-101-xxx | short | fload, float 
FSUBR ST(0), mem64real DCh mm_-101-xxx | short | fload, float 
FSUBR ST(0), ST(i) D8h 11-100-xxx | short | float 
FSUBR ST(i), ST(0) DCh 11-101-xxx | short | float 
FSUBRP ST(i), ST(0) DEh 11-100-xxx | short | float 
FTST D9h E4h short | float 
FUCOM DDh 11-100-xxx | short | float 
FUCOMP DDh 11-101-xxx | short | float 
FUCOMPP DAh E9h short | float 
FXAM D9h E5h short | float 
FXCH D9h 11-001-xxx | short | float 
FXTRACT D9h F4h vector 
FYL2X D9h Fih short | float 
FYL2XP1 D9h F9h short | float 
FWAIT 9Bh vector 
Note: 
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Kastviction Mnemonic Prefix | First | ModR/M | Decode RISC86 Note 
Byte(s) | Byte Byte Type Operations 
EMMS OFh 77h vector 
MOVD mmreg, mreg32 OFh 6Eh | 11-xxx-xxx | short | meu i 
MOVD mmreg, mem32 OFh 6Eh | mm-xxx-xxx | short / mload 
MOVD mreg32, mmreg OFh 7Eh | 11-xxx-xxx | short | mstore, load a 
MOVD mem32, mmreg OFh 7Eh | mm-xxx-xxx | short | mstore 
MOVQ mmreg1, mmreg2 OFh 6Fh | 11-xxx-xxx | short / meu 
MOVQ mmreg, mem64 OFh 6Fh | mm-xxx-xxx | short | mload 
MOVQ mmreg2, mmreg1 OFh 7Fh ) 11-xxx-xxx | short | meu 
MOVQ mem64, mmreg OFh 7Fh | mm-xxx-xxx | short | mstore 
PACKSSDW mmreg1, mmreg2 OFh | 6Bh | 11-xxx-xxx | short | meu 
PACKSSDW mmreg, mem64 OFh 6Bh | mm-xxx-xxx | short | mload, meu 
PACKSSWB mmreg1, mmreg2 OFh | 63h) 11-xxx-xxx | short | meu 
PACKSSWB mmreg, mem64 OFh | 63h | mm-xxx-xxx | short | mload, meu 
PACKUSWB mmreg1, mmreg2 OFh | 67h | 11-xxx-xxx | short | meu 
PACKUSWB mmreg, mem64 OFh 67h | mm-xxx-xxx | short | mload, meu 
PADDB mmreg1, mmreg2 OFh | FCh ) 11-xxx-xxx | short | meu 
PADDB mmreg, mem64 OFh FCh | mm-xxx-xxx | short / mload, meu 
PADDD mmreg1, mmreg2 OFh | FEH | 11-xxx-xxx | short | meu 
PADDD mmreg, mem64 OFh FEh | mm-xxx-xxx | short | mload, meu 
PADDSB mmreg1, mmreg2 OFh | ECh) 11-xxx-xxx | short | meu 
PADDSB mmreg, mem64 OFh ECh | mm-xxx-xxx | short | mload, meu 
PADDSW mmreg1, mmreg2 OFh | EDh| 11-xxx-xxx | short | meu 
PADDSW mmreg, mem64 OFh EDh | mm-xxx-xxx | short | mload, meu 
PADDUSB mmreg1, mmreg2 OFh | DCh) 11-xxx-xxx | short | meu 
PADDUSB mmreg, mem64 OFh =| DCh) mm-xxx-xxx | short | mload, meu 
PADDUSW mmreg1, mmreg2 OFh | DDh) 11-xxx-xxx | short | meu 
PADDUSW mmreg, mem64 OFh | DDh | mm-xxx-xxx | short | mload, meu 
PADDW mmreg1, mmreg2 OFh | FDh) 11-xxx-xxx | short | meu 
PADDW mmreg, mem64 OFh FDh | mm-xxx-xxx | short | mload, meu 
PAND mmreg1, mmreg2 OFh | DBhy 11-xxx-xxx | short | meu 
PAND mmreg, mem64 OFh DBh | mm-xxx-xxx | short | mload, meu 
Note: 
** Bits 2, 1, and 0 of the modR/M byte select the integer register. 
78 Software Environment Chapter 3 


Preliminary Information AMD 





21850)/0—February 2000 AMD-K6®-2 Processor Data Sheet 


Table 16. MMX™ Instructions (continued) 




















































































































sacteuction Maghionic Prefix | First | ModR/M | Decode RISC86 Note 
Byte(s) | Byte Byte Type Operations 

PANDN mmreg1, mmreg2 OFh =| DFh | 11-xxx-xxx | short | meu 
PANDN mmreg, mem64 OFh DFh | mm-xxx-xxx | short | mload, meu 
PCMPEQB mmreg1, mmreg2 OFh 74h | 11-xxx-xxx | short | meu 
PCMPEQB mmreg, mem64 OFh 74h | mm-xxx-xxx | short | mload, meu 
PCMPEQD mmreg1, mmreg2 OFh 76h | 11-xxx-xxx | short | meu 
PCMPEQD mmreg, mem64 OFh 76h | mm-xxx-xxx | short | mload, meu 
PCMPEQW mmreg1, mmreg2 OFh 75h | 11-Xxx-xxx | short | meu 
PCMPEQW mmreg, mem64 OFh 75h | mm-xxx-xxx | short | mload, meu 
PCMPGTB mmreg1, mmreg2 OFh | 64h | 11-xxx-xxx | short | meu 
PCMPGTB mmreg, mem64 OFh 64h | mm-xxx-xxx | short | mload, meu 
PCMPGTD mmreg1, mmreg2 OFh | 66h ) 11-xxx-xxx | short | meu 
PCMPGTD mmreg, mem64 OFh 66h | mm-xxx-xxx | short | mload, meu 
PCMPGTW mmreg1, mmreg2 OFh | 65h | 11-xxx-xxx | short | meu 
PCMPGTW mmreg, mem64 OFh 65h | mm-xxx-xxx | short | mload, meu 
PMADDWD mmreg1, mmreg2 OFh | F5h | 11-xxx-xxx | short | meu 
PMADDWD mmreg, mem64 OFh F5h | mm-xxx-xxx | short | mload, meu 
PMULHW mmreg1, mmreg2 OFh | E5h | 11-xxx-xxx | short | meu 
PMULHW mmreg, mem64 OFh E5h | mm-xxx-xxx | short | mload, meu 
PMULLW mmreg1, mmreg2 OFh | D5h) 11-xxx-xxx | short | meu 
PMULLW mmreg, mem64 OFh D5h | mm-xxx-xxx | short | mload, meu 
POR mmreg1, mmreg2 OFh | EBh) 11-xxx-xxx | short | meu 

POR mmreg, mem64 OFh EBh | mm-xxx-xxx | short / mload, meu 
PSLLD mmreg1, mmreg2 OFh | F2h | 11-xxx-xxx | short | meu 

PSLLD mmreg, mem64 OFh F2h | mm-xxx-xxx | short | mload, meu 
PSLLD mmreg, imm8s OFh 72h | 11-110-xxx | short | meu 

PSLLQ mmreg1, mmreg2 OFh F3h | 11-xxx-xxx | short | meu 

PSLLQ mmreg, mem64 OFh F3h | mm-xxx-xxx | short | mload, meu 
PSLLQ mmreg, imm8 OFh 73h | 11-110-xxx | short | meu 

PSLLW mmreg1, mmreg2 OFh | Fih | 11-xxx-xxx | short | meu 

PSLLW mmreg, mem64 OFh Fih | mm-xxx-xxx | short | mload, meu 
PSLLW mmreg, imm8s OFh 7h | 11-110-xxx | short | meu 

Note: 

** Bits 2, 1, and 0 of the modk/M byte select the integer register. 
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tastraction Mnemonic Prefix | First | ModR/M | Decode RISC86 Note 
Byte(s) | Byte Byte Type Operations 

PSRAD mmreg1, mmreg2 OFh | E2h | 11-xxx-xxx | short | meu 

PSRAD mmreg, mem64 OFh E2h | mm-xxx-xxx | short | mload, meu 
PSRAD mmreg, imm8 OFh | 72h | 11-100-xxx | short | meu 
PSRAW mmreg1, mmreg2 OFh | Elh | 11-xxx-xxx | short | meu 
PSRAW mmreg, mem64 OFh Eth | mm-xxx-xxx | short | mload, meu 
PSRAW mmreg, imm8 OFh 71h | 11-100-xxx | short | meu 

PSRLD mmreg1, mmreg2 OFh | D2h) 11-xxx-xxx | short | meu 

PSRLD mmreg, mem64 OFh D2h | mm-xxx-xxx | short | mload, meu 
PSRLD mmreg, imm8 OFh | 72h | 11-010-xxx | short | meu 

PSRLQ mmreg1, mmreg2 OFh | D3h) 11-xxx-xxx | short | meu 

PSRLQ mmreg, mem64 OFh D3h | mm-xxx-xxx | short | mload, meu 
PSRLQ mmreg, imm8s OFh | 73h | 11-010-xxx | short | meu 
PSRLW mmreg1, mmreg2 OFh | Dih) 11-xxx-xxx | short | meu 
PSRLW mmreg, mem64 OFh Dih | mm-xxx-xxx | short | mload, meu 
PSRLW mmreg, imms OFh | 71h |) 11-010-xxx | short | meu 

PSUBB mmreg1, mmreg2 OFh | F8h | 11-xxx-xxx | short | meu 

PSUBB mmreg, mem64 OFh F8h | mm-xxx-xxx | short | mload, meu 
PSUBD mmreg1, mmreg2 OFh | FAH | 11-xxx-xxx | short | meu 
PSUBD mmreg, mem64 OFh FAh | mm-xxx-xxx | short | mload, meu 
PSUBSB mmreg1, mmreg2 OFh | E8h | 11-xxx-xxx | short | meu 
PSUBSB mmreg, mem64 OFh E8h | mm-xxx-xxx | short | mload, meu 
PSUBSW mmreg1, mmreg2 OFh | E9h | 11-xxx-xxx | short | meu 
PSUBSW mmreg, mem64 OFh E9h | mm-xxx-xxx | short | mload, meu 
PSUBUSB mmreg]1, mmreg2 OFh | D8h) 11-xxx-xxx | short | meu 
PSUBUSB mmreg, mem64 OFh D8h | mm-xxx-xxx | short | mload, meu 
PSUBUSW mmreg1, mmreg2 OFh | D9h) 11-xxx-xxx | short | meu 
PSUBUSW mmreg, mem64 OFh D9h | mm-xxx-xxx | short | mload, meu 
PSUBW mmreg1, mmreg2 OFh | F9h | 11-xxx-xxx | short | meu 
PSUBW mmreg, mem64 OFh F9h | mm-xxx-xxx | short | mload, meu 
PUNPCKHBW mmreg1, mmreg2 OFh | 68h | 11-xxx-xxx | short | meu 
PUNPCKHBW mmreg, mem64 OFh 68h | mm-xxx-xxx | short | mload, meu 
Note: 

** Bits 2, 1, and 0 of the modk/M byte select the integer register. 
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tastraction Mnemonic Prefix | First | ModR/M | Decode RISC86 Note 
Byte(s) | Byte Byte Type Operations 

PUNPCKHDQ mmreg1, mmreg2 OFh 6Ah | 11-xxx-xxx | short | meu 
PUNPCKHDQ mmreg, mem64 OFh 6Ah | mm-xxx-xxx | short | mload, meu 
PUNPCKHWD mmreg1, mmreg2 OFh | 69h | 11-xxx-xxx | short | meu 
PUNPCKHWD mmreg, mem64 OFh 69h | mm-xxx-xxx | short | mload, meu 
PUNPCKLBW mmreg1, mmreg2 OFh | 60h) 11-xxx-xxx | short | meu 
PUNPCKLBW mmreg, mem32 OFh 60h | mm-xxx-xxx | short | mload, meu 
PUNPCKLDQ mmreg1, mmreg2 OFh | 62h) 11-xxx-xxx | short | meu 
PUNPCKLDQ mmreg, mem32 OFh 62h | mm-xxx-xxx | short / mload, meu 
PUNPCKLWD mmreg1, mmreg2 OFh | 61h |) 11-xxx-xxx | short | meu 
PUNPCKLWD mmreg, mem32 OFh 61h | mm-xxx-xxx | short | mload, meu 
PXOR mmreg1, mmreg2 OFh | EFh | 11-xxx-xxx | short | meu 
PXOR mmreg, mem64 OFh EFh | mm-xxx-xxx | short | mload, meu 
Note: 

** Bits 2, 1, and 0 of the modk/M byte select the integer register. 

Table 17. 3DNow!™ Instructions 
Instruction Mnemonic Prefix | Opcode | ModR/M | Decode RISC86 Note 
Byte(s) | Byte Byte Type Operations 

FEMMS OFh OEh vector 
PAVGUSB mmreg1, mmreg2 OFh, OFh | BFh 11-Xxx-xxx | short | meu 
PAVGUSB mmreg, mem64 OFh, OFh | BFh_ | mm-xxx-xxx | short | mload, meu 
PF2ID mmreg1, mmreg2 OFh, O0Fh | 1Dh 11-xxx-xxx | short | meu 
PF2ID mmreg, mem64 OFh, OFh} 1Dh_ | mm-xxx-xxx | short | mload, meu 
PFACC mmreg1, mmreg2 OFh, OFh | AEh 11-xxx-xxx | short / meu 
PFACC mmreg, mem64 OFh,OFh} AEh | mm-xxx-xxx | short | mload, meu 
PFADD mmreg1, mmreg2 OFh, OFh| 9Eh 11-Xxx-xxx | short | meu 
PFADD mmreg, mem64 OFh, OFh| 9Eh mm-xxx-xxx | short | mload, meu 
PFCMPEQ mmreg1, mmreg2 OFh, OFh| Boh 11-Xxx-xxx | short | meu 
PFCMPEQ mmreg, mem64 OFh, OFh} BOh | mm-xxx-xxx | short | mload, meu 
PFCMPGE mmreg1, mmreg2 OFh, OFh|} 90h 11-Xxx-xxx | short | meu 
Notes: 

1. ee TCH and PREFETCHW, the memé value refers to a byte address within the 32-byte line that will be 

2: bere TCHW will be implemented in a future K86 processor. On the AMD-K6-2 processor, this instruction performs in 

the same manner as the PREFETCH instruction. 
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the same manner as t 























Instruction Mnemonic Prefix | Opcode | ModR/M | Decode RISC86 Note 
Byte(s) | Byte Byte Type Operations 

PFCMPGE mmreg, mem64 OFh, OFh}| 90h | mm-xxx-xxx | short | mload, meu 
PFCMPGT mmreg1, mmreg2 OFh, OFh | AOh 11-Xxx-xxx | short | meu 
PFCMPGT mmreg, mem64 OFh, OFh} AOh_ | mm-xxx-xxx | short | mload, meu 
PFMAX mmreg1, mmreg2 OFh, OFh | A4h 11-xxx-xxx | short | meu 
PFMAX mmreg, mem64 OFh, OFh | A4h | mm-xxx-xxx | short | mload, meu 
PFMIN mmreg1, mmreg2 OFh, OFh} 94h 11-xxx-xxx | short | meu 
PFMIN mmreg, mem64 OFh, OFh | 94h | mm-xxx-xxx | short | mload, meu 
PFMUL mmreg1, mmreg2 OFh, OFh| B4h 11-xxx-xxx | short | meu 
PFMUL mmreg, mem64 OFh, OFh} B4h | mm-xxx-xxx | short | mload, meu 
PFRCP mmreg1, mmreg2 OFh, OFh| 96h 11-Xxx-xxx | short | meu 
PFRCP mmreg, mem64 OFh, OFh| 96h | mm-xxx-xxx | short | mload, meu 
PFRCPIT1 mmreg1, mmreg2 OFh, OFh | A6h 11-xxx-xxx | short | meu 
PFRCPIT1 mmreg, mem64 OFh, OFh | A6h | mm-xxx-xxx | short | mload, meu 
PFRCPIT2 mmreg1, mmreg2 OFh, OFh | B6h 11-xxx-xxx | short | meu 
PFRCPIT2 mmreg, mem64 OFh, OFh | B6h_ | mm-xxx-xxx | short | mload, meu 
PFRSQIT1 mmreg1, mmreg2 OFh, OFh) A7h 11-Xxx-xxx | short | meu 
PFRSQIT1 mmreg, mem64 OFh, OFh | AZ7h_ | mm-xxx-xxx | short | mload, meu 
PFRSQRT mmreg1, mmreg2 OFh, OFh| 97h 11-xxx-xxx | short | meu 
PFRSQRT mmreg, mem64 OFh, OFh|} 97h mm-xxx-xxx | short | mload, meu 
PFSUB mmreg1, mmreg2 OFh, O0Fh | 9Ah 11-Xxx-xxx | short | meu 
PFSUB mmreg, mem64 OFh, OFh | 9Ah_ | mm-xxx-xxx | short | mload, meu 
PFSUBR mmreg1, mmreg2 OFh, OFh | AAh 11-Xxx-xxx | short | meu 
PFSUBR mmreg, mem64 OFh, OFh| AAh | mm-xxx-xxx | short | mload, meu 
PI2FD mmreg1, mmreg2 OFh, OFh | ODh 11-xxx-xxx | short | meu 
PI2FD mmreg, mem64 OFh, OFh | ODh_ | mm-xxx-xxx | short | mload, meu 
PMULHRW mmreg1, mmreg2 OFh, OFh| B7h 11-xxx-xxx | short | meu 
PMULHRW mmreg1, mem64 OFh, OFh| = B7h_ | mm-xxx-xxx | short | mload, meu 
PREFETCH mem8s OFh ODh_ | mm-000-xxx | vector | load 1 
PREFETCHW mems OFh ODh_ | mm-001-xxx | vector | load 1,2 
Notes: 


1. For PREFETCH and PREFETCHW, the memé value refers to a byte address within the 32-byte line that will be 


2. PREFETCHW will be Bl agile in a future K86 processor. On the AMD-K6-2 processor, this instruction performs in 
e PREFETCH instruction. 
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4.1 


Signal Terminology 


The following terminology is used in this chapter: 


Driven—The processor actively pulls the signal up to the 
High-voltage state or pulls the signal down to the 
Low-voltage state. 


Floated—The the signal is not being driven by the processor 
(high-impedance state), which allows another device to 
drive this signal. 


Asserted—For all active-High signals, the term asserted 
means the signal is in the High-voltage state. For all 
active-Low signals, the term asserted means the signal is in 
the Low-voltage state. 


Negated—For all active-High signals, the term negated 
means the signal is in the Low-voltage state. For all 
active-Low signals, the term negated means the signal is in 
the High-voltage state. 


Sampled—The processor has measured the state of a signal 
at predefined points in time and will take the appropriate 
action based on the state of the signal. If a signal is not 
sampled by the processor, its assertion or negation has no 
effect on the operation of the processor. 


Figure 52 on page 84 shows the signals grouped by function. The 
arrows in the figure indicate the direction of the signal, either 
into or out of the processor. Signals with double-headed arrows 
are bidirectional. Signals with pound signs (#) are active Low. 
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Clock Voltage Detection 
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4.2 A20M# (Address Bit 20 Mask) 

Input 
Summary A20M# is used to simulate the behavior of the 8086 when 


running in Real mode. The assertion of A2Z0M# causes the 
processor to force bit 20 of the physical address to 0 prior to 
accessing the cache or driving out a memory bus cycle. The 
clearing of address bit 20 maps addresses that extend above the 
8086 1-Mbyte limit to below 1 Mbyte. 


Sampled The processor samples A20M# as a level-sensitive input on 
every clock edge. The system logic can drive the signal either 
synchronously or asynchronously. If it is asserted 
asynchronously, it must be asserted for a minimum pulse width 
of two clocks. 


The following list explains the effects of the processor sampling 
A20M# asserted under various conditions: 


m Inquire cycles and writeback cycles are not affected by the 
state of A2Z0M#. 


m The assertion of A2Z0M# in System Management Mode 
(SMM) is ignored. 

=» When A20M# is sampled asserted in Protected mode, it 
causes unpredictable processor operation. A2Z0M# is only 
defined in Real mode. 


= To ensure that A20M# is recognized before the first ADS# 
occurs following the negation of RESET, A2Z0M# must be 
sampled asserted on the same clock edge that RESET is 
sampled negated or on one of the two subsequent clock 
edges. 


= To ensure A20M# is recognized before the execution of an 
instruction, a serializing instruction must be executed 
between the instruction that asserts A2ZOM# and the 
targeted instruction. 
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4.3 A[31:3] (Address Bus) 
A[31:5] Bidirectional, A[4:3] Output 
Summary A[31:3] contain the physical address for the current bus cycle. 


Driven, Sampled, and 
Floated 


The processor drives addresses on A[31:3] during memory and 
I/O cycles, and cycle definition information during special bus 
cycles. The processor samples addresses on A[31:5] during 
inquire cycles. 


As Outputs: A[31:3] are driven valid off the same clock edge as 
ADS# and remain in the same state until the clock edge on 
which NA# or the last expected BRDY# of the cycle is sampled 
asserted. A[31:3] are driven during memory cycles, I/O cycles, 
special bus cycles, and interrupt acknowledge cycles. The 
processor continues to drive the address bus while the bus is 
idle. 


As Inputs: The processor samples A[31:5] during inquire cycles 
on the clock edge on which EADS# is sampled asserted. Even 
though A4 and A3 are not used during the inquire cycle, they 
must be driven to a valid state and must meet the same timings 
as A[31:5]. 


A[31:3] are floated off the clock edge that AHOLD or BOFF# is 
sampled asserted and off the clock edge that the processor 
asserts HLDA in recognition of HOLD. 


The processor resumes driving A[31:3] off the clock edge on 
which the processor samples AHOLD or BOFF# negated and off 
the clock edge on which the processor negates HLDA. 
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4.4 ADS# (Address Strobe) 
Output 
Summary The assertion of ADS# indicates the beginning of a new bus 


Driven and Floated 


cycle. The address bus and all cycle definition signals 
corresponding to this bus cycle are driven valid off the same 
clock edge as ADS#. 


ADS# is asserted for one clock at the beginning of each bus 
cycle. For non-pipelined cycles, ADS# can be asserted as early 
as the clock edge after the clock edge on which the last 
expected BRDY# of the cycle is sampled asserted, resulting in a 
single idle state between cycles. For pipelined cycles if the 
processor is prepared to start a new cycle, ADS# can be asserted 
as early as one clock edge after NA#is sampled asserted. 


If AHOLD is sampled asserted, ADS# is only driven in order to 
perform a writeback cycle due to an inquire cycle that hits a 
modified cache line. 


The processor floats ADS# off the clock edge that BOFF# is 
sampled asserted and off the clock edge that the processor 
asserts HLDA in recognition of HOLD. 


4.5 ADSC# (Address Strobe Copy) 


Summary 


Output 


ADSC# has the identical function and timing as ADS#. In the 
event ADS# becomes too heavily loaded due to a large fanout in 
a system, ADSC# can be used to split the load across two 
outputs, which can improve system timing. 
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4.6 


Summary 


Sampled 


AHOLD (Address Hold) 


Input 


AHOLD can be asserted by the system to initiate one or more 
inquire cycles. To allow the system to drive the address bus 
during an inquire cycle, the processor floats A[31:3] and AP off 
the clock edge on which AHOLD is sampled asserted. The data 
bus and all other control and status signals remain under the 
control of the processor and are not floated. This allows a bus 
cycle that is in progress when AHOLD is sampled asserted to 
continue to completion. The processor resumes driving the 
address bus off the clock edge on which AHOLD is sampled 
negated. 


If AHOLD is sampled asserted, ADS# is only asserted in order 
to perform a writeback cycle due to an inquire cycle that hits a 
modified cache line. 


The processor samples AHOLD on every clock edge. AHOLD is 
recognized while INIT and RESET are sampled asserted. 
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4.7 AP (Address Parity) 
Bidirectional 
Summary AP contains the even parity bit for cache line addresses driven 


Driven, Sampled, and 
Floated 


and sampled on A[31:5]. Even parity means that the total 
number of 1 bits on AP and A[31:5] is even. (A4 and A3 are not 
used for the generation or checking of address parity because 
these bits are not required to address a cache line.) AP is driven 
by the processor during processor-initiated cycles and is 
sampled by the processor during inquire cycles. If AP does not 
reflect even parity during an inquire cycle, the processor 
asserts APCHK# to indicate an address bus parity check. The 
processor does not take an internal exception as the result of 
detecting an address bus parity check, and system logic must 
respond appropriately to the assertion of this signal. 


As an Output: The processor drives AP valid off the clock edge 
on which ADS# is asserted until the clock edge on which NA# or 
the last expected BRDY# of the cycle is sampled asserted. AP is 
driven during memory cycles, I/O cycles, special bus cycles, and 
interrupt acknowledge cycles. The processor continues to drive 
AP while the bus is idle. 


As an Input: The processor samples AP during inquire cycles on 
the clock edge on which EADS# is sampled asserted. 


The processor floats AP off the clock edge that AHOLD or 
BOFF#is sampled asserted and off the clock edge that the 
processor asserts HLDA in recognition of HOLD. 


The processor resumes driving AP off the clock edge on which 
the processor samples AHOLD or BOFF# negated and off the 
clock edge on which the processor negates HLDA. 
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4.8 APCHK# (Address Parity Check) 

Output 
Summary If the processor detects an address parity error during an 


inquire cycle, APCHK# is asserted for one clock. The processor 
does not take an internal exception as the result of detecting an 
address bus parity check, and system logic must respond 
appropriately to the assertion of this signal. 


The processor is designed so that APCHK# does not glitch, 
enabling the signal to be used as a clocking source for system 
logic. 


Driven APCHK# is driven valid off the clock edge after the clock edge 
on which the processor samples EADS# asserted. It is negated 
off the next clock edge. 


APCHK# is always driven except in the Tri-State Test mode. 
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4.9 BE[7:0]# (Byte Enables) 
Output 
Summary BE[7:0]# are used by the processor to indicate the valid data 


Driven and Floated 


bytes during a write cycle and the requested data bytes during 
a read cycle. The byte enables can be used to derive address bits 
A[2:0], which are not physically part of the processor’s address 
bus. The processor checks and generates valid data parity for 
the data bytes that are valid as defined by the byte enables. The 
eight byte enables correspond to the eight bytes of the data bus 
as follows: 


m BE7#: D[63:56] m BE3#: D[31:24] 
m BE6#: D[55:48] m BE2#: D[23:16] 
m BES#: D[47:40] m BE1#: D[15:8] 
m BE4#: D[39:32] a BEO#: D[7:0] 


The processor expects data to be driven by the system logic on 
all eight bytes of the data bus during a burst cache-line read 
cycle, independent of the byte enables that are asserted. 


The byte enables are also used to distinguish between special 
bus cycles as defined in Table 25 on page 126. 


BE[7:0]# are driven off the same clock edge as ADS# and 
remain in the same state until the clock edge on which NA# or 
the last expected BRDY# of the cycle is sampled asserted. 
BE[7:0]# are driven during memory cycles, I/O cycles, special 
bus cycles, and interrupt acknowledge cycles. 


The processor floats BE[7:0]# off the clock edge that BOFF# is 
sampled asserted and off the clock edge that the processor 
asserts HLDA in recognition of HOLD. Unlike the address bus, 
BE[7:0]# are not floated in response to AHOLD. 
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4.10 BF[2:0] (Bus Frequency) 
Inputs, Internal Pullups 

Summary BF[2:0] determine the internal operating frequency of the 
processor. The frequency of the CLK input signal is multiplied 
internally by a ratio determined by the state of these signals as 
defined in Table 18. BF[2:0] have weak internal pullups and 
default to the 3.5 multiplier if left unconnected. 

Table 18. Processor-to-Bus Clock Ratios 
State of BF[2:0] Inputs Processor-Clock to Bus-Clock Ratio 
100b 2.5x 
101b 3.0x 
110b 2.0x or 6.0x* 
111b 3.5X 
000b 4.5x 
001b 5.0x 
010b 4.0x 
O11b 5.5X 
Note: 
* The ratio selected is dependent on the stepping of the Model 8. The 2.0x 
ratio is supported on the Model 8/{7:0], whereas the 6.0x ratio is supported 
on the Model 8/[F-8]. 

Sampled BF[2:0] are sampled during the falling transition of RESET. 
They must meet a minimum setup time of 1.0 ms anda 
minimum hold time of two clocks relative to the negation of 
RESET. 
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BOFF# (Backoff) 


Input 


If BOFF# is sampled asserted, the processor unconditionally 
aborts any cycles in progress and transitions to a bus hold state 
by floating the following signals: A[31:3], ADS#, ADSC#, AP, 
BE[7:0]#, CACHE#, D[63:0], D/C#, DP[7:0], LOCK#, M/IO#, 
PCD, PWT, SCYC, and W/R#. These signals remain floated until 
BOFF# is sampled negated. This allows an alternate bus master 
or the system to control the bus. 


When BOFF# is sampled negated, any processor cycle that was 
aborted due to the assertion of BOFF# is restarted from the 
beginning of the cycle, regardless of the number of transfers 
that were completed. If BOFF#is sampled asserted on the same 
clock edge as BRDY# of a bus cycle of any length, then BOFF# 
takes precedence over the BRDY#. In this case, the cycle is 
aborted and restarted after BOFF#is sampled negated. 


BOFF# is sampled on every clock edge. The processor floats its 
bus signals off the clock edge on which BOFF# is sampled 
asserted. These signals remain floated until the clock edge on 
which BOFF# is sampled negated. 


BOFF# is recognized while INIT and RESET are sampled 
asserted. 
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4.12 BRDY# (Burst Ready) 

Input, Internal Pullup 
Summary BRDY# is asserted to the processor by system logic to indicate 


either that the data bus is being driven with valid data during a 
read cycle or that the data bus has been latched during a write 
cycle. If necessary, the system logic can insert bus cycle wait 
states by negating BRDY# until it is ready to continue the data 
transfer. BRDY# is also used to indicate the completion of 
special bus cycles. 


Sampled BRDY# is sampled every clock edge within a bus cycle starting 
with the clock edge after the clock edge that negates ADS#. 
BRDY# is ignored while the bus is idle. The processor samples 
the following inputs on the clock edge on which BRDY# is 
sampled asserted: D[63:0], DP[7:0], and KEN# during read 
cycles, EWBE# during write cycles (if not masked off), and 
WB/WT# during read and write cycles. If NA#is sampled 
asserted prior to BRDY#, then KEN# and WB/WT# are sampled 
on the clock edge on which NA# is sampled asserted. 


The number of times the processor expects to sample BRDY# 
asserted depends on the type of bus cycle, as follows: 


m One time for a single-transfer cycle, a special bus cycle, or 
each of two cycles in an interrupt acknowledge sequence 


= Four times for a burst cycle (once for each data transfer) 


BRDY# can be held asserted for four consecutive clocks 
throughout the four transfers of the burst, or it can be negated 
to insert wait states. 
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BRDYC# (Burst Ready Copy) 


Input, Internal Pullup 


BRDYC# has the identical function as BRDY#. In the event 
BRDY # becomes too heavily loaded due to a large fanout or 
loading in a system, BRDYC# can be used to reduce this 
loading, which improves timing. 


In addition, BRDYC#is sampled when RESET is negated to 
configure the drive strength of A[20:3], ADS#, HITM#, and 
W/R#. If BRDYC# is 0 during the falling transition of RESET, 
these particular outputs are configured using higher drive 
strengths than the standard strength. If BRDYC#is 1 during the 
falling transition of RESET, the standard strength is selected. 


BRDYC# is sampled every clock edge within a bus cycle starting 
with the clock edge after the clock edge that negates ADS#. 


BRDYCi#is also sampled during the falling transition of RESET. 
If RESET is driven synchronously, BRDYC# must meet the 
specified hold time relative to the negation of RESET. If 
RESET is driven asynchronously, the minimum setup and hold 
time for BRDYC# relative to the negation of RESET is two 
clocks. 
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4.14 BREQ (Bus Request) 


Summary 


Driven 


Output 


BREQ is asserted by the processor to request the bus in order to 
complete an internally pending bus cycle. The system logic can 
use BREQ to arbitrate among the bus participants. If the 
processor does not own the bus, BREQ is asserted until the 
processor gains access to the bus in order to begin the pending 
cycle or until the processor no longer needs to run the pending 
cycle. If the processor currently owns the bus, BREQ is asserted 
with ADS#. The processor asserts BREQ for each assertion of 
ADS# but does not necessarily assert ADS# for each assertion of 
BREQ. 


BREQ is asserted off the same clock edge on which ADS# is 
asserted. BREQ can also be asserted off any clock edge, 
independent of the assertion of ADS#. BREQ can be negated 
one clock edge after it is asserted. 


The processor always drives BREQ except in the Tri-State Test 
mode. 


4.15 CACHE# (Cacheable Access) 


Summary 


Driven and Floated 


Output 


For reads, CACHE# is asserted to indicate the cacheability of 
the current bus cycle. In addition, if the processor samples 
KEN# asserted, which indicates the driven address is 
cacheable, the cycle is a 32-byte burst read cycle. For write 
cycles, CACHE #is asserted to indicate the current bus cycle isa 
modified cache-line writeback. KEN# is ignored during 
writebacks. If CACHE# is not asserted, or if KEN#is sampled 
negated during a read cycle, the cycle is not cacheable and 
defaults to a single-transfer cycle. 


CACHE # is driven off the same clock edge as ADS# and remains 
in the same state until the clock edge on which NA# or the last 
expected BRDY# of the cycle is sampled asserted. 


CACHE+# is floated off the clock edge that BOFF# is sampled 
asserted and off the clock edge that the processor asserts HLDA 
in recognition of HOLD. 
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4.16 CLK (Clock) 
Input 
Summary The CLK signal is the bus clock for the processor and is the 


Sampled 


reference for all signal timings under normal operation (except 
for TDI, TDO, TMS, and TRST#). BF[2:0] determine the internal 
frequency multiplier applied to CLK to obtain the processor’s 
core operating frequency. See “BF[2:0] (Bus Frequency)” on 
page 92 for a list of the processor-to-bus clock ratios. 


The CLK signal must be stable a minimum of 1.0 ms prior to the 
negation of RESET to ensure the proper operation of the 
processor. See “CLK Switching Characteristics” on page 267 for 
details regarding the CLK specifications. 


4.17 D/C# (Data/Code) 


Summary 


Driven and Floated 


Output 


The processor drives D/C# during a memory bus cycle to 
indicate whether it is addressing data or executable code. D/C# 
is also used to define other bus cycles, including interrupt 
acknowledge and special cycles. See Table 25 on page 126 for 
more details. 


D/C#is driven off the same clock edge as ADS# and remains in 
the same state until the clock edge on which NA# or the last 
expected BRDY# of the cycle is sampled asserted. D/C# is 
driven during memory cycles, I/O cycles, special bus cycles, and 
interrupt acknowledge cycles. 


D/C# is floated off the clock edge that BOFF# is sampled 
asserted and off the clock edge that the processor asserts HLDA 
in recognition of HOLD. 
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4.18 D[63:0] (Data Bus) 
Bidirectional 
Summary D[63:0] represent the processor’s 64-bit data bus. Each of the 


Driven, Sampled, and 
Floated 


eight bytes of data that comprise this bus is qualified as valid 
by its corresponding byte enable. See “BE[7:0]# (Byte 
Enables)” on page 91. 


As Outputs: For single-transfer write cycles, the processor drives 
D[63:0] with valid data one clock edge after the clock edge on 
which ADS# is asserted and D[63:0] remain in the same state 
until the clock edge on which BRDY # is sampled asserted. If the 
cycle is a writeback—in which case four, 8-byte transfers 
occur—D[63:0] are driven one clock edge after the clock edge 
on which ADS# is asserted and are subsequently changed off 
the clock edge on which each BRDY# assertion of the burst 
cycle is sampled. 


If the assertion of ADS# represents a pipelined write cycle that 
follows a read cycle, the processor does not drive D[63:0] until it 
is certain that contention on the data bus will not occur. In this 
case, D[63:0] are driven the clock edge after the last expected 
BRDY# of the previous cycle is sampled asserted. 


As Inputs: During read cycles, the processor samples D[63:0] on 
the clock edge on which BRDY# is sampled asserted. 


The processor always floats D[63:0] except when they are being 
driven during a write cycle as described above. In addition, 
D[63:0] are floated off the clock edge that BOFF# is sampled 
asserted and off the clock edge that the processor asserts 
HLDA in recognition of HOLD. 





98 


Signal Descriptions Chapter 4 


Preliminary Information AMD 





21850)/0—February 2000 


AMD-K6®-2 Processor Data Sheet 


4.19 DP[7:0] (Data Parity) 
Bidirectional 
Summary DP[7:0] are even parity bits for each valid byte of data—as 


Driven, Sampled, and 
Floated 


defined by BE[7:0]#—driven and sampled on the D[63:0] data 
bus. Even parity means that the total number of 1 bits within 
each byte of data and its respective data parity bit is an even 
number. DP[7:0] are driven by the processor during write cycles 
and sampled by the processor during read cycles. If the 
processor detects bad parity on any valid byte of data during a 
read cycle, PCHK# is asserted for one clock beginning the clock 
edge after BRDY# is sampled asserted. The processor does not 
take an internal exception as the result of detecting a data 
parity check, and system logic must respond appropriately to 
the assertion of this signal. 


The eight data parity bits correspond to the eight bytes of the 
data bus as follows: 


m DP7: D[63:56] m DP3: D[31:24] 
m DP6: D[55:48] m DP2: D[23:16] 
m DP5: D[47:40] m DP1: D[15:8] 
m DP4: D[39:32] = DPO: D[7:0] 


For systems that do not support data parity, DP[7:0] should be 
connected to Vcc3 through pullup resistors. 


As Outputs: For single-transfer write cycles, the processor drives 
DP[7:0] with valid parity one clock edge after the clock edge on 
which ADS# is asserted and DP[7:0] remain in the same state 
until the clock edge on which BRDY# is sampled asserted. If the 
cycle is a writeback, DP[7:0] are driven one clock edge after the 
clock edge on which ADS# is asserted and are subsequently 
changed off the clock edge on which each BRDY# assertion of 
the burst cycle is sampled. 


As Inputs: During read cycles, the processor samples DP[7:0] on 
the clock edge BRDY# is sampled asserted. 


The processor always floats DP[7:0] except when they are being 
driven during a write cycle as described above. In addition, 
DP[7:0] are floated off the clock edge that BOFF# is sampled 
asserted and off the clock edge that the processor asserts 
HLDA in recognition of HOLD. 
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4.20 EADS# (External Address Strobe) 
Input 


Summary System logic asserts EADS# during a cache inquire cycle to 
indicate that the address bus contains a valid address. EADS# 
can only be driven after the system logic has taken control of 
the address bus by asserting AHOLD or BOFF# or by receiving 
HLDA. The processor responds to the sampling of EADS# and 
the address bus by driving HIT#, which indicates if the inquired 
cache line exists in the processor’s cache, and HITM#, which 
indicates if it is in the modified state. 


Sampled If AHOLD or BOFF# is asserted by the system logic in order to 
execute a cache inquire cycle, the processor begins sampling 
EADS# two clock edges after AHOLD or BOFF# is sampled 
asserted. If the system logic asserts HOLD in order to execute a 
cache inquire cycle, the processor begins sampling EADS# two 
clock edges after the clock edge HLDA is asserted by the 
processor. 


EADS # is ignored during the following conditions: 
m One clock edge after the clock edge on which EADS# is 
sampled asserted 


= Two clock edges after the clock edge on which ADS# is 
asserted 


m When the processor is driving the address bus 
When the processor asserts HITM# 
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EWBE# (External Write Buffer Empty) 


Input 


The system logic can negate EWBE# to the processor to indicate 
that its external write buffers are full and that additional data 
cannot be stored at this time. This causes the processor to delay 
the following activities until EWBE# is sampled asserted: 


ms The commitment of write hit cycles to cache lines in the 
modified state or exclusive state in the processor’s cache 

m The decode and execution of an instruction that follows a 
currently-executing serializing instruction 

ms The assertion or negation of SMIACT# 
The entering of the Halt state and the Stop Grant state 


Negating EWBE# does not prevent the completion of any type 
of cycle that is currently in progress. 


The processor samples EWBE# on each clock edge that BRDY# 
is sampled asserted during all memory write cycles (except 
writeback cycles), I/O write cycles, and special bus cycles. 


If EWBE# is sampled negated, it is sampled on every clock edge 
until it is asserted, and then it is ignored until BRDY# is 
sampled asserted in the next write cycle or special cycle. 


On the AMD-K6-2 Model 8/[F:8] processor, if EFER[3] is set to 
1, then EWBE# is ignored by the processor. For more 
information on the EFER settings and EWBE#, see “EWBE 
Control” on page 201. 
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4.22 FERR# (Floating-Point Error) 
Output 


Summary The assertion of FERR# indicates the occurrence of an 
unmasked floating-point exception resulting from the 
execution of a floating-point instruction. This signal is provided 
to allow the system logic to handle this exception in a manner 
consistent with IBM-compatible PC/AT systems. See “Handling 
Floating-Point Exceptions” on page 207 for a system logic 
implementation that supports floating-point exceptions. 


The state of the numeric error (NE) bit in CRO does not affect 
the FERR# signal. 


The processor is designed so that FERR# does not glitch, 
enabling the signal to be used as a clocking source for system 
logic. 


Driven The processor asserts FERR# on the instruction boundary of 
the next floating-point instruction, MMX instruction, 3DNow! 
instruction, or WAIT instruction that occurs following the 
floating-point instruction that caused the unmasked 
floating-point exception—that is, FERR# is not asserted at the 
time the exception occurs. The IGNNE# signal does not affect 
the assertion of FERR#. 


FERR #is negated during the following conditions: 
m Following the successful execution of the floating-point 
instructions FCLEX, FINIT, FSAVE, and FSTENV 


m» Under certain circumstances, following the successful 
execution of the floating-point instructions FLDCW, 
FLDENV, and FRSTOR, which load the floating-point status 
word or the floating-point control word 


m Following the falling transition of RESET 
FERR#is always driven except in the Tri-State Test mode. 


See “IGNNE# (Ignore Numeric Exception)” on page 106 for 
more details on floating-point exceptions. 
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FLUSH# (Cache Flush) 


Input 


In response to sampling FLUSH# asserted, the processor writes 
back any data cache lines that are in the modified state, 
invalidates all lines in the instruction and data caches, and then 
executes a flush acknowledge special cycle. See Table 25 on 
page 126 for the bus definition of special cycles. 


In addition, FLUSH# is sampled when RESET is negated to 
determine if the processor enters the Tri-State Test mode. If 
FLUSH# is 0 during the falling transition of RESET, the 
processor enters the Tri-State Test mode instead of performing 
the normal RESET functions. 


FLUSH# is sampled and latched as a falling edge-sensitive 
signal. During normal operation (not RESET), FLUSH# is 
sampled on every clock edge but is not recognized until the next 
instruction boundary. If FLUSH# is asserted synchronously, it 
can be asserted for a minimum of one clock. If FLUSH# is 
asserted asynchronously, it must have been negated for a 
minimum of two clocks, followed by an assertion of a minimum 
of two clocks. 


FLUSH #is also sampled during the falling transition of RESET. 
If RESET and FLUSH# are driven synchronously, FLUSH# is 
sampled on the clock edge prior to the clock edge on which 
RESET is sampled negated. If RESET is driven asynchronously, 
the minimum setup and hold time for FLUSH#, relative to the 
negation of RESET, is two clocks. 
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4.24 


Summary 


Driven 


4.25 


Summary 


Driven 


HIT# (Inquire Cycle Hit) 


Output 


The processor asserts HIT# during an inquire cycle to indicate 
that the cache line is valid within the processor’s instruction or 
data cache (also known as a cache hit). The cache line can be in 
the modified, exclusive, or shared state. 


HIT#is always driven—except in the Tri-State Test mode—and 
only changes state the clock edge after the clock edge on which 
EADS# is sampled asserted. It is driven in the same state until 
the next inquire cycle. 


HITM# (Inquire Cycle Hit To Modified Line) 


Output 


The processor asserts HITM# during an inquire cycle to 
indicate that the cache line exists in the processor’s data cache 
in the modified state. The processor performs a writeback cycle 
as a result of this cache hit. If an inquire cycle hits a cache line 
that is currently being written back, the processor asserts 
HITM# but does not execute another writeback cycle. The 
system logic must not expect the processor to assert ADS# each 
time HITM#is asserted. 


HITM# is always driven—except in the Tri-State Test mode— 
and, in particular, is driven to represent the result of an inquire 
cycle the clock edge after the clock edge on which EADS# is 
sampled asserted. If HITM# is negated in response to the 
inquire address, it remains negated until the next inquire cycle. 
If HITM# is asserted in response to the inquire address, it 
remains asserted throughout the writeback cycle and is negated 
one clock edge after the last BRDY# of the writeback is 
sampled asserted. 
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4.26 


Summary 


Driven 


4.27 


Summary 


Sampled 
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HLDA (Hold Acknowledge) 


Output 


When HOLD is sampled asserted, the processor completes the 
current bus cycles, floats the processor bus, and asserts HLDA 
in an acknowledgment that these events have been completed. 
The processor does not assert HLDA until the completion of a 
locked sequence of cycles. While HLDA is asserted, another bus 
master can drive cycles on the bus, including inquire cycles to 
the processor. The following signals are floated when HLDA is 
asserted: A[31:3], ADS#, ADSC#, AP, BE[7:0]#, CACHE#, 
D[63:0], D/C#, DP[7:0], LOCK#, M/IO#, PCD, PWT, SCYC, and 
W/R#. 


The processor is designed so that HLDA does not glitch. 


HLDA is always driven except in the Tri-State Test mode. If a 
processor cycle is in progress while HOLD is sampled asserted, 
HLDA is asserted one clock edge after the last BRDY# of the 
cycle is sampled asserted. If the bus is idle, HLDA is asserted 
one clock edge after HOLD is sampled asserted. HLDA is 
negated one clock edge after the clock edge on which HOLD is 
sampled negated. 


The assertion of HLDA is independent of the sampled state of 
BOFF#. 


The processor floats the bus every clock in which HLDA is 
asserted. 


HOLD (Bus Hold Request) 


Input 


The system logic can assert HOLD to gain control of the 
processor’s bus. When HOLD is sampled asserted, the processor 
completes the current bus cycles, floats the processor bus, and 
asserts HLDA in an acknowledgment that these events have 
been completed. 


The processor samples HOLD on every clock edge. If a 
processor cycle is in progress while HOLD is sampled asserted, 
HLDA is asserted one clock edge after the last BRDY# of the 
cycle is sampled asserted. If the bus is idle, HLDA is asserted 
one clock edge after HOLD is sampled asserted. HOLD is 
recognized while INIT and RESET are sampled asserted. 
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4.28 IGNNE# (Ignore Numeric Exception) 

Input 
Summary IGNNE#, in conjunction with the numeric error (NE) bit in CRO, 


is used by the system logic to control the effect of an unmasked 
floating-point exception on a previous floating-point instruction 
during the execution of a floating-point instruction, MMX 
instruction, 3DNow! instruction, or the WAIT instruction— 
hereafter referred to as the target instruction. 


If an unmasked floating-point exception is pending and the 
target instruction is considered error-sensitive, then the 
relationship between NE and IGNNE#is as follows: 


ms If NE=0, then: 


¢ If IGNNE#is sampled asserted, the processor ignores the 
floating-point exception and continues with the 
execution of the target instruction. 


¢ If IGNNE# is sampled negated, the processor waits until 
it samples IGNNE#, INTR, SMI#, NMI, or INIT asserted. 


If IGNNE# is sampled asserted while waiting, the 
processor ignores the floating-point exception and 
continues with the execution of the target instruction. 


If INTR, SMI#, NMI, or INIT is sampled asserted while 
waiting, the processor handles its assertion 
appropriately. 
= If NE = 1, the processor invokes the INT 10h exception 
handler. 


If an unmasked floating-point exception is pending and the 
target instruction is considered error-insensitive, then the 
processor ignores the floating-point exception and continues 
with the execution of the target instruction. 


FERR# is not affected by the state of the NE bit or IGNNE#. 
FERR# is always asserted at the instruction boundary of the 
target instruction that follows the floating-point instruction 
that caused the unmasked floating-point exception. 


This signal is provided to allow the system logic to handle 
exceptions in a manner consistent with IBM-compatible PC/AT 
systems. 





106 Signal Descriptions Chapter 4 


Preliminary Information AMD 





21850)/0—February 2000 


Sampled 
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The processor samples IGNNE# as a level-sensitive input on 
every clock edge. The system logic can drive the signal either 
synchronously or asynchronously. If it is asserted 
asynchronously, it must be asserted for a minimum pulse width 
of two clocks. 


4.29 INIT (Initialization) 


Summary 


Sampled 


Input 


The assertion of INIT causes the processor to empty its 
pipelines, to initialize most of its internal state, and to branch 
to address FFFF_FFF0h—the same instruction execution 
starting point used after RESET. Unlike RESET, the processor 
preserves the contents of its caches, the floating-point state, the 
MMxX< state, Model-Specific Registers, the CD and NW bits of 
the CRO register, and other specific internal resources. 


INIT can be used as an accelerator for 80286 code that requires 
a reset to exit from Protected mode back to Real mode. 


INIT is sampled and latched as a rising edge-sensitive signal. 
INIT is sampled on every clock edge but is not recognized until 
the next instruction boundary. During an I/O write cycle, it must 
be sampled asserted a minimum of three clock edges before 
BRDY# is sampled asserted if it is to be recognized on the 
boundary between the I/O write instruction and the following 
instruction. 


If INIT is asserted synchronously, it can be asserted fora 
minimum of one clock. If it is asserted asynchronously, it must 
have been negated for a minimum of two clocks, followed by an 
assertion of a minimum of two clocks. 
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4.30 INTR (Maskable Interrupt) 
Input 


Summary INTR is the system’s maskable interrupt input to the processor. 
When the processor samples and recognizes INTR asserted, the 
processor executes a pair of interrupt acknowledge bus cycles 
and then jumps to the interrupt service routine specified by the 
interrupt number that was returned during the interrupt 
acknowledge sequence. The processor only recognizes INTR if 
the interrupt flag (IF) in the EFLAGS register equals 1. 


Sampled The processor samples INTR as a level-sensitive input on every 
clock edge, but the interrupt request is not recognized until the 
next instruction boundary. The system logic can drive INTR 
either synchronously or asynchronously. If it is asserted 
asynchronously, it must be asserted for a minimum pulse width 
of two clocks. In order to be recognized, INTR must remain 
asserted until an interrupt acknowledge sequence is complete. 


4.31 INV (Invalidation Request) 
Input 
Summary During an inquire cycle, the state of INV determines whether 


an addressed cache line that is found in the processor’s 
instruction or data cache transitions to the invalid state or the 
shared state. 


If INV is sampled asserted during an inquire cycle, the 
processor transitions the cache line (if found) to the invalid 
state, regardless of its previous state. If INV is sampled negated 
during an inquire cycle, the processor transitions the cache line 
(if found) to the shared state. In either case, if the cache line is 
found in the modified state, the processor writes it back to 
memory before changing its state. 


Sampled INV is sampled on the clock edge on which EADS# is sampled 
asserted. 
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4.32 


Summary 


Sampled 
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KEN# (Cache Enable) 


Input 


If KEN#is sampled asserted, it indicates that the address 
presented by the processor is cacheable. If KEN# is sampled 
asserted and the processor intends to perform a cache-line fill 
(signified by the assertion of CACHE#), the processor executes 
a 32-byte burst read cycle and expects to sample BRDY# 
asserted a total of four times. If KEN#is sampled negated 
during a read cycle, a single-transfer cycle is executed and the 
processor does not cache the data. For write cycles, CACHE# is 
asserted to indicate the current bus cycle is a modified 
cache-line writeback. KEN#is ignored during writebacks. 


If PCD is asserted during a bus cycle, the processor does not 
cache any data read during that cycle, regardless of the state of 
KEN#. See “PCD (Page Cache Disable)” on page 113 for more 
details. 


If the processor has sampled the state of KEN# during a cycle, 
and that cycle is aborted due to the sampling of BOFF# 
asserted, the system logic must ensure that KEN#is sampled in 
the same state when the processor restarts the aborted cycle. 


KEN# is sampled on the clock edge on which the first BRDY # or 
NA# of a read cycle is sampled asserted. If the read cycle isa 
burst, KEN# is ignored during the last three assertions of 
BRDY#. KEN# is sampled during read cycles only when 
CACHE#is asserted. 
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4.33 LOCK# (Bus Lock) 


Summary 


Driven and Floated 


Output 


The processor asserts LOCK# during a sequence of bus cycles to 
ensure that the cycles are completed without allowing other bus 
masters to intervene. Locked operations consist of two to five 
bus cycles. LOCK# is asserted during the following operations: 


An interrupt acknowledge sequence 
Descriptor Table accesses 

Page Directory and Page Table accesses 
XCHG instruction 

An instruction with an allowable LOCK prefix 


In order to ensure that locked operations appear on the bus and 
are visible to the entire system, any data operands addressed 
during a locked cycle that reside in the processor’s cache are 
flushed and invalidated from the cache prior to the locked 
operation. If the cache line is in the modified state, it is written 
back and invalidated prior to the locked operation. Likewise, 
any data read during a locked operation is not cached. 


The processor is designed so that LOCK# does not glitch. 


During a locked cycle, LOCK# is asserted off the same clock 
edge on which ADS# is asserted and remains asserted until the 
last BRDY# of the last bus cycle is sampled asserted. The 
processor negates LOCK# for at least one clock between 
consecutive sequences of locked operations to allow the system 
logic to arbitrate for the bus. 


LOCK# is floated off the clock edge that BOFF# is sampled 
asserted and off the clock edge that the processor asserts HLDA 
in response to HOLD. When LOCK# is floated due to BOFF# 
sampled asserted, the system logic is responsible for preserving 
the lock condition while LOCK# is in the high-impedance state. 
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4.34 M/10# (Memory or 1/0) 


Summary 


Driven and Floated 


Output 


The processor drives M/IO# during a bus cycle to indicate 
whether it is addressing the memory or I/O space. If M/IO# = 1, 
the processor is addressing memory or a memory-mapped I/O 
port as the result of an instruction fetch or an instruction that 
loads or stores data. If M/IO# = 0, the processor is addressing an 
I/O port during the execution of an I/O instruction. In addition, 
M/IO# is used to define other bus cycles, including interrupt 
acknowledge and special cycles. See Table 25 on page 126 for 
more details. 


M/IO# is driven off the same clock edge as ADS# and remains in 
the same state until the clock edge on which NA# or the last 
expected BRDY# of the cycle is sampled asserted. M/IO# is 
driven during memory cycles, I/O cycles, special bus cycles, and 
interrupt acknowledge cycles. 


M/IO# is floated off the clock edge that BOFF# is sampled 
asserted and off the clock edge that the processor asserts HLDA 
in response to HOLD. 
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4.35 NA# (Next Address) 
Input 


Summary System logic asserts NA# to indicate to the processor that it is 
ready to accept another bus cycle pipelined into the previous 
bus cycle. ADS#, along with address and status signals, can be 
asserted as early as one clock edge after NA# is sampled 
asserted if the processor is prepared to start a new cycle. 
Because the processor allows a maximum of two cycles to be in 
progress at a time, the assertion of NA# is sampled while two 
cycles are in progress but ADS# is not asserted until the 
completion of the first cycle. 


Sampled NA# is sampled every clock edge during bus cycles, starting one 
clock edge after the clock edge that negates ADS#, until the last 
expected BRDY# of the last executed cycle is sampled asserted 
(with the exception of the clock edge after the clock edge that 
negates the ADS# for a second pending cycle). Because the 
processor latches NA# when sampled, the system logic only 
needs to assert NA# for one clock. 


4.36 NMI (Non-Maskable Interrupt) 
Input 


Summary When NMI is sampled asserted, the processor jumps to the 
interrupt service routine defined by interrupt number 02h. 
Unlike the INTR signal, software cannot mask the effect of NMI 
if it is sampled asserted by the processor. However, NMI is 
temporarily masked upon entering System Management Mode 
(SMM). In addition, an interrupt acknowledge cycle is not 
executed because the interrupt number is predefined. 


If NMI is sampled asserted while the processor is executing the 
interrupt service routine for a previous NMI, the subsequent 
NMI remains pending until the completion of the execution of 
the IRET instruction at the end of the interrupt service routine. 


Sampled NMI is sampled and latched as a rising edge-sensitive signal. 
During normal operation, NMI is sampled on every clock edge 
but is not recognized until the next instruction boundary. If it is 
asserted synchronously, it can be asserted for a minimum of one 
clock. If it is asserted asynchronously, it must have been 
negated for a minimum of two clocks, followed by an assertion 
of a minimum of two clocks. 
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4.37 PCD (Page Cache Disable) 
Output 
Summary The processor drives PCD to indicate the operating system’s 


Driven and Floated 


specification of cacheability for the page being addressed. 
System logic can use PCD to control external caching. If PCD is 
asserted, the addressed page is not cached. If PCD is negated, 
the cacheability of the addressed page depends upon the state 
of CACHE# and KEN#. 


The state of PCD depends upon the processor’s operating mode 
and the state of certain bits in its control registers and TLB as 
follows: 


m In Real mode, or in Protected and Virtual-8086 modes while 
paging is disabled (PG bit in CRO set to 0): 
PCD output = CD bit in CRO 
= In Protected and Virtual-8086 modes while caching is 


enabled (CD bit in CRO set to 0) and paging is enabled (PG 
bit in CRO set to 1): 


¢ For accesses to I/O space, page directory entries, and 
other non-paged accesses: 
PCD output = PCD bit in CR3 

¢ For accesses to 4-Kbyte page table entries or 4-Mbyte 
pages: 
PCD output = PCD bit in page directory entry 

¢ For accesses to 4-Kbyte pages: 
PCD output = PCD bit in page table entry 


PCD is driven off the same clock edge as ADS# and remains in 
the same state until the clock edge on which NA# or the last 
expected BRDY# of the cycle is sampled asserted. 


PCD is floated off the clock edge that BOFF# is sampled 
asserted and off the clock edge that the processor asserts HLDA 
in response to HOLD. 
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4.38 PCHK# (Parity Check) 
Output 


Summary The processor asserts PCHK# during read cycles if it detects an 
even parity error on one or more valid bytes of D[63:0] during a 
read cycle. (Even parity means that the total number of 1 bits 
within each byte of data and its respective data parity bit is 
even.) The processor checks data parity for the data bytes that 
are valid, as defined by BE[7:0]#, the byte enables. 


PCHK# is always driven but is only asserted for memory and I/O 
read bus cycles and the second cycle of an interrupt 
acknowledge sequence. PCHK# is not driven during any type of 
write cycles or special bus cycles. The processor does not take 
an internal exception as the result of detecting a data parity 
error, and system logic must respond appropriately to the 
assertion of this signal. 


The processor is designed so that PCHK# does not glitch, 
enabling the signal to be used as a clocking source for system 
logic. 


Driven PCHK# is always driven except in the Tri-State Test mode. For 
each BRDY# returned to the processor during a read cycle with 
a parity error detected on the data bus, PCHK# is asserted for 
one clock, one clock edge after BRDY# is sampled asserted. 
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4.39 PWT (Page Writethrough) 


Summary 


Driven and Floated 


Output 


The processor drives PWT to indicate the operating system’s 
specification of the writeback state or writethrough state for 
the page being addressed. PWT, together with WB/WT#, 
specifies the data cache-line state during cacheable read misses 
and write hits to shared cache lines. See “WB/WT# (Writeback 
or Writethrough)” on page 123 for more details. 


The state of PWT depends upon the processor’s operating mode 
and the state of certain bits in its control registers and TLB as 
follows: 


= In Real mode, or in Protected and Virtual-8086 modes while 
paging is disabled (PG bit in CRO set to 0): 


PWT output = 0 (writeback state) 
= In Protected and Virtual-8086 modes while paging is 

enabled (PG bit in CRO set to 1): 

¢ For accesses to I/O space, page directory entries, and 
other non-paged accesses: 
PWT output = PWT bit in CR3 

¢ For accesses to 4-Kbyte page table entries or 4-Mbyte 
pages: 
PWT output = PWT bit in page directory entry 

¢ For accesses to 4-Kbyte pages: 
PWT output = PWT bit in page table entry 


PWT is driven off the same clock edge as ADS# and remains in 
the same state until the clock edge on which NA# or the last 
expected BRDY# of the cycle is sampled asserted. 


PWT is floated off the clock edge that BOFF# is sampled 
asserted and off the clock edge that the processor asserts HLDA 
in response to HOLD. 
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4.40 RESET (Reset) 
Input 


Summary When the processor samples RESET asserted, it immediately 
flushes and initializes all internal resources and its internal 
state including its pipelines and caches, the floating-point 
state, the MMX state, the 3DNow! state, and all registers, and 
then the processor jumps to address FFFF_FFFOh to start 
instruction execution. 


The signals BRDYC# and FLUSH# are sampled during the 
falling transition of RESET to select the drive strength of 
selected output signals and to invoke the Tri-State Test mode, 
respectively. See these signal descriptions for more details. 


Sampled RESET is sampled as a level-sensitive input on every clock 
edge. System logic can drive the signal either synchronously or 
asynchronously. 


During the initial power-on reset of the processor, RESET must 
remain asserted for a minimum of 1.0 ms after CLK and Vcc 
reach specification before it is negated. 


During a warm reset, while CLK and V¢¢ are within their 
specification, RESET must remain asserted for a minimum of 
15 clocks prior to its negation. 


4.41 RSVD (Reserved) 


Summary Reserved signals are a special class of pins that can be treated 
in one of the following ways: 


m As no-connect (NC) pins, in which case these pins are left 
unconnected 


m As pins connected to the system logic as defined by the 
industry-standard Super7 and Socket 7 interface 


m Any combination of NC and Socket 7 pins 


In any case, if the RSVD pins are treated accordingly, the 
normal operation of the AMD-K6-2 processor is not adversely 
affected in any manner. 


See “Pin Designations” on page 297 for a list of the locations of 
the RSVD pins. 
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4.42 SCYC (Split Cycle) 


Summary 


Driven and Floated 


Output 


The processor asserts SCYC during misaligned, locked transfers 
on the D[63:0] data bus. The processor generates additional bus 
cycles to complete the transfer of misaligned data. 


For purposes of bus cycles, the term aligned means: 
m Any 1-byte transfers 


m 2-byte and 4-byte transfers that le within 4-byte address 
boundaries 


ms 8-byte transfers that lie within 8-byte address boundaries 


SCYC is asserted off the same clock edge as ADS#, and negated 
off the clock edge on which NA# or the last expected BRDY# of 
the entire locked sequence is sampled asserted. SCYC is only 
valid during locked memory cycles. 


SCYC is floated off the clock edge that BOFF# is sampled 
asserted and off the clock edge that the processor asserts HLDA 
in response to HOLD. 


4.43 SMI# (System Management Interrupt) 


Summary 


Input, Internal Pullup 


The assertion of SMI# causes the processor to enter System 
Management Mode (SMM). Upon recognizing SMI#, the 
processor performs the following actions, in the order shown: 


1. Flushes its instruction pipelines 
2. Completes all pending and in-progress bus cycles 


3. Acknowledges the interrupt by asserting SMIACT# after 
sampling EWBE# asserted (if EWBE# is masked off, then 
SMIACT# is not affected by EWBE#) 


4. Saves the internal processor state in SMM memory 


5. Disables interrupts by clearing the interrupt flag (IF) in 
EFLAGS and disables NMI interrupts 


6. Jumps to the entry point of the SMM service routine at the 
SMM base physical address which defaults to 0003_8000h in 
SMM memory 
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See “System Management Mode (SMM)” on page 211 for more 
details regarding SMM. 


Sampled SMI# is sampled and latched as a falling edge-sensitive signal. 
SMI# is sampled on every clock edge but is not recognized until 
the next instruction boundary. If SMI# is to be recognized on 
the instruction boundary associated with a BRDY#, it must be 
sampled asserted a minimum of three clock edges before the 
BRDY# is sampled asserted. If it is asserted synchronously, it 
can be asserted for a minimum of one clock. If it is asserted 
asynchronously, it must have been negated for a minimum of 
two clocks followed by an assertion of a minimum of two clocks. 


A second assertion of SMI# while in SMM is latched but is not 
recognized until the SMM service routine is exited. 


4.44 SMIACT# (System Management Interrupt Active) 
Output 


Summary The processor acknowledges the assertion of SMI# with the 
assertion of SMIACT# to indicate that the processor has 
entered System Management Mode (SMM). The system logic 
can use SMIACT# to enable SMM memory. See “SMI# (System 
Management Interrupt)” on page 117 for more details. 


See “System Management Mode (SMM)” on page 211 for more 
details regarding SMM. 


Driven The processor asserts SMIACT# after the last BRDY# of the last 
pending bus cycle is sampled asserted (including all pending 
write cycles) and after EWBE# is sampled asserted (if EWBE# 
is masked off, then SMIACT# is not affected by EWBE#). 
SMIACT# remains asserted until after the last BRDY# of the 
last pending bus cycle associated with exiting SMM is sampled 
asserted. 


SMIACT# remains asserted during any flush, internal snoop, or 
writeback cycle due to an inquire cycle. 
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STPCLK# (Stop Clock) 


Input, Internal Pullup 


The assertion of STPCLK# causes the processor to enter the 
Stop Grant state, during which the processor’s internal clock is 
stopped. From the Stop Grant state, the processor can 
subsequently transition to the Stop Clock state, in which the 
bus clock CLK is stopped. Upon recognizing STPCLK#, the 
processor performs the following actions, in the order shown: 


1. Flushes its instruction pipelines 
2. Completes all pending and in-progress bus cycles 


3. Acknowledges the STPCLK# assertion by executing a Stop 
Grant special bus cycle (see Table 25 on page 126) 


4. Stops its internal clock after BRDY# of the Stop Grant 
special bus cycle is sampled asserted and after EWBE# is 
sampled asserted (if EWBE# is masked off, then entry into 
the Stop Grant state is not affected by EWBE#) 


5. Enters the Stop Clock state if the system logic stops the bus 
clock CLK (optional) 


See “Clock Control” on page 243 for more details regarding 
clock control. 


STPCLK# is sampled as a level-sensitive input on every clock 
edge but is not recognized until the next instruction boundary. 
System logic can drive the signal either synchronously or 
asynchronously. If it is asserted asynchronously, it must be 
asserted for a minimum pulse width of two clocks. 


STPCLK# must remain asserted until recognized, which is 
indicated by the completion of the Stop Grant special cycle. 


TCK (Test Clock) 


Input, Internal Pullup 


TCK is the clock for boundary-scan testing using the Test 
Access Port (TAP). See “Boundary-Scan Test Access Port 
(TAP)” on page 223 for details regarding the operation of the 
TAP controller. 
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Sampled The processor always samples TCK, except while TRST# is 
asserted. 

4.47 TDI (Test Data Input) 
Input, Internal Pullup 

Summary TDI is the serial test data and instruction input for 
boundary-scan testing using the Test Access Port (TAP). See 
“Boundary-Scan Test Access Port (TAP)” on page 223 for details 
regarding the operation of the TAP controller. 

Sampled The processor samples TDI on every rising TCK edge but only 


while in the Shift-IR and Shift-DR states. 


4.48 TDO (Test Data Output) 


Summary 


Driven and Floated 


Output 


TDO is the serial test data and instruction output for 
boundary-scan testing using the Test Access Port (TAP). See 
“Boundary-Scan Test Access Port (TAP)” on page 223 for details 
regarding the operation of the TAP controller. 


The processor drives TDO on every falling TCK edge but only 
while in the Shift-IR and Shift-DR states. TDO is floated at all 
other times. 


4.49 TMS (Test Mode Select) 


Summary 


Sampled 


Input, Internal Pullup 


TMS specifies the test function and sequence of state changes 
for boundary-scan testing using the Test Access Port (TAP). See 
“Boundary-Scan Test Access Port (TAP)” on page 223 for details 
regarding the operation of the TAP controller. 


The processor samples TMS on every rising TCK edge. If TMS is 
sampled High for five or more consecutive clocks, the TAP 
controller enters its Test-Logic-Reset state, regardless of the 
controller state. This action is the same as that achieved by 
asserting TRST#. 
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4.50 


Summary 


Sampled 


4.51 


Summary 


Driven 


4.52 


Summary 


Driven 
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TRST# (Test Reset) 


Input, Internal Pullup 


The assertion of TRST# initializes the Test Access Port (TAP) by 
resetting its state machine to the Test-Logic-Reset state. See 
“Boundary-Scan Test Access Port (TAP)” on page 223 for details 
regarding the operation of the TAP controller. 


TRST# is a completely asynchronous input that does not 
require a minimum setup and hold time relative to TCK. See 
Table 69 on page 280 for the minimum pulse width requirement. 


VCC2DET (V¢c2 Detect) 


Output 


VCC2DET is internally tied to Vgg (logic level 0) to indicate to 
the system logic that it must supply the specified dual-voltage 
requirements to the Vcc2 and Vc¢c3 pins. The Vcc? pins supply 
voltage to the processor core, independent of the voltage 
supplied to the I/O buffers on the Vcc3 pins. Upon sampling 
VCC2DET Low, system logic should sample VCC2H/L# to 
identify core voltage requirements. 


VCC2DET always equals 0 and is never floated—even during 
the Tri-State Test mode. 


VCC2H/L# (Vcc High/Low) 


Output 


VCC2H/L# is internally tied to Vgg (logic level 0) to indicate to 
the system logic that it must supply the specified processor core 
voltage to the Vcc? pins. The Vcc? pins supply voltage to the 
processor core, independent of the voltage supplied to the I/O 
buffers on the Vcc3 pins. Upon sampling VCC2DET Low to 
identify dual-voltage processor requirements, system logic 
should sample VCC2H/L# to identify the core voltage 
requirements for 2.9V and 3.2V products (High) or 2.2V and 
2.4 V products (Low). 


VCC2H/L# always equals 0 and is never floated for 2.2 V and 
2.4 V products—even during the Tri-State Test mode. To ensure 
proper operation for 2.9V and 3.2V products, system logic that 
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samples VCC2H/L# should design a weak pullup resistor for 
this signal. 


Table 19. Output Pin Float Conditions 

















Name Floated At: Note 
VCC2DET Always Driven = 
VCC2H/L# Always Driven a‘ 
Note: 

* All outputs except VCC2DET, VCC2H/L#, and TDO float 

during the Tri-State Test mode. 











4.53 W/R# (Write/Read) 


Summary 


Driven and Floated 


Output 


The processor drives W/R# to indicate whether it is performing 
a write or a read cycle on the bus. In addition, W/R# is used to 
define other bus cycles, including interrupt acknowledge and 
special cycles. See Table 25 on page 126 for more details. 


W/R# is driven off the same clock edge as ADS# and remains in 
the same state until the clock edge on which NA# or the last 
expected BRDY# of the cycle is sampled asserted. W/R# is 
driven during memory cycles, I/O cycles, special bus cycles, and 
interrupt acknowledge cycles. 


W/R# is floated off the clock edge that BOFF# is sampled 
asserted and off the clock edge that the processor asserts HLDA 
in response to HOLD. 
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4.54 


Summary 


Sampled 
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WB/WT# (Writeback or Writethrough) 


Input 


WB/WT#, together with PWT, specifies the data cache-line state 
during cacheable read misses and write hits to shared cache 
lines. 


If WB/WT# = 0 or PWT = 1 during a cacheable read miss or write 
hit to a shared cache line, the accessed line is cached in the 
shared state. This is referred to as the writethrough state 
because all write cycles to this cache line are driven externally 
on the bus. 


If WB/WT# = 1 and PWT = 0 during a cacheable read miss or a 
write hit to a shared cache line, the accessed line is cached in 
the exclusive state. Subsequent write hits to the same line 
cause its state to transition from exclusive to modified. This is 
referred to as the writeback state because the data cache can 
contain modified cache lines that are subject to be written 
back—referred to as a writeback cycle—as the result of an 
inquire cycle, an internal snoop, a flush operation, or the 
WBINVD instruction. 


WB/WT# is sampled on the clock edge that the first BRDY# or 
NA# of a bus cycle is sampled asserted. If the cycle is a burst 
read, WB/WT# is ignored during the last three assertions of 
BRDY#. WB/WT# is sampled during memory read and 
non-writeback write cycles and is ignored during all other types 
of cycles. 
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Name Type Note Name Type Note 
A20M# Asynchronous 1 IGNNE# Asynchronous 1 
AHOLD Synchronous INIT Asynchronous 2 
BF[2:0] Synchronous 4 INTR Asynchronous 1 
BOFF# Synchronous INV Synchronous 
BRDY# Synchronous KEN# Synchronous 
BRDYC# Synchronous 7 NA# Synchronous 
CLK Clock NMI Asynchronous 2 
EADS# Synchronous RESET Asynchronous 5,6 
EWBE# Synchronous 8 SMI# Asynchronous 2 
FLUSH# Asynchronous 2,3 STPCLK# Asynchronous 1 
HOLD Synchronous WB/WT# Synchronous 
Notes: 





1. 


2. 





These level-sensitive signals can be asserted synchronously or asynchronously. To be sampled on a specific clock edge, setup and 
hold times must be met. If asserted asynchronously, they must be asserted for a minimum pulse width of two clocks. 


These edge-sensitive signals can be asserted synchronously or asynchronously. To be sampled on a specific clock edge, setup and 
hold times must be met. If asserted asynchronously, they must have been negated at least two clocks prior to assertion and must 
remain asserted at least two clocks. 

FLUSH# Is also sampled during the falling transition of RESET and can be asserted synchronously or asynchronously. To be 
sampled on a specific clock edge, setup and hold times must be met relative to the clock edge before the clock edge on which 
RESET ts sampled negated. If asserted asynchronously, FLUSH# must meet a minimum setup and hold time of two clocks relative 
to the negation of RESET. 

BF/2:0] are sampled during the falling transition of RESET. They must meet a minimum setup time of 1.0 ms and a minimum hold 
time of two clocks relative to the negation of RESET. 

During the initial power-on reset of the processor, RESET must remain asserted for a minimum of 1.0 ms after CLK and Vcc reach 
specification before it is negated. 

During a warm reset, while CLK and Vcc are within their speatication, RESET must remain asserted for a minimum of 15 clocks 
prior to its negation. 

BRDYC# Is also sampled during the falling transition of RESET. If RESET is driven synchronously, BRDYC# must meet the specified 
hold time relative to the negation of RESET. If asserted asynchronously, BRDYC# must meet a minimum setup and hold time of 
two clocks relative to the negation of RESET. 


On the AMD-K6-2 processor Model 8/[F:8], if EFER/3] is set to 1, then EWBE# is ignored by the processor. 
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Table 21. Output Pin Float Conditions 
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Name Floated At: (Note 1) Note Name Floated At: (Note 1) Note 
A[4:3] HLDA, AHOLD, BOFF# 2,3 HLDA Always Driven 
ADS# HLDA, BOFF# 2 LOCK# HLDA, BOFF# 
ADSC# HLDA, BOFF# 2 M/lO# HLDA, BOFF# 
APCHK# Always Driven PCD HLDA, BOFF# 2 
BE[7:0]# HLDA, BOFF# 2 PCHK# Always Driven 
BREQ Always Driven PWT HLDA, BOFF# 2 
CACHE# HLDA, BOFF# 2 SCYC HLDA, BOFF# 2 
D/C# HLDA, BOFF# 2 SMIACT# Always Driven 
FERR# Always Driven VCC2DET Always Driven 
HIT# Always Driven VCC2H/L# Always Driven 
HITM# Always Driven W/R# HLDA, BOFF# 2 
Notes: 

1. All outputs except VCC2DET, VCC2H/L#, and TDO float during the Tri-State Test mode. 

2. Floated off the clock edge that BOFF# is sampled asserted and off the clock edge that HLDA is asserted. 

3. Floated off the clock edge that AHOLD is sampled asserted. 








Table 22. Input/Output Pin Float Conditions 




















Name Floated At: (Note 1) Note 

A[B1:5] HLDA, AHOLD, BOFF# 2,3 
AP HLDA, AHOLD, BOFF# 2,3 
D[63:0] HLDA, BOFF# 2 
DP[7:0] HLDA, BOFF# 2 
Notes: 

1. All outputs except VCC2DET and TDO float during the Tri-State Test mode. 

2. Floated off the clock edge that BOFF¥# is sampled asserted and off the clock edge that HLDA is asserted. 

3. Floated off the clock edge that AHOLD is sampled asserted. 











Table 23. Test Pins 



































Name Type Note 
TCK Clock 
TDI Input Sampled on the rising edge of TCK 
TDO Output Driven on the falling edge of TCK 
TMS Input Sampled on the rising edge of TCK 
TRST# Input Asynchronous (Independent of TCK) 
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Table 24. Bus Cycle Definition 





Generated 


Generated by the Processor by the System 


Bus Cycle Initiated 





M/lO# D/C# W/R# | CACHE# KEN# 
Code Read, Instruction Cache Line Fill 1 0 0 0 0 
1 X 





— 


Code Read, Noncacheable 





X 1 
1 


Code Read, Noncacheable 1 





Encoding for Special Cycle 








I/O Read 
1/O Write 
Memory Read, Data Cache Line Fill 1 1 





—-|}~o;o;-|o|o 


0 
Interrupt Acknowledge 0 
0 
0 











Memory Read, Noncacheable 1 1 
Memory Write, Data Cache Writeback 1 1 
Memory Write, Noncacheable 1 1 1 


Note: 
X means “don’t care” 





0 

Memory Read, Noncacheable 1 1 0 1 
0 
1 





—| CO} x< 
~< 


























Table 25. Special Cycles 





Special Cycle 


—| Aa 
— | BE7# 
— | BEG# 
— | BES# 
— | BEa# 
— | BE3# 
o | BE2# 
— | BEI# 
~ | BEO# 
o | M/lO# 
o | D/C# 
— | W/R# 
— | CACHE# 
>< | KEN# 


Stop Grant 





Flush Acknowledge 
(FLUSH# sampled asserted) 


Writeback 
(WBINVD instruction) 


Halt 0 | 1 1 1 1 1} 04] 1 1; 0};0/7 1 | X 


Flush (INVD, WBINVD 
instruction) 
Shutdown 0 1 ] ] ] ] 1 ] 0/0; 0 1 1 X 


Note: 
X means “don’t care” 
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Bus Cycles 





The following sections describe and illustrate the timing and 
relationship of bus signals during various types of bus cycles. A 
representative set of bus cycles is illustrated. 


Timing Diagrams 


The timing diagrams illustrate the signals on the external local 
bus as a function of time, as measured by the bus clock (CLK). 
Throughout this chapter, the term clock refers to a single 
bus-clock cycle. A clock extends from one rising CLK edge to 
the next rising CLK edge. The processor samples and drives 
most signals relative to the rising edge of CLK. The exceptions 
to this rule include the following: 


BF[2:0]—Sampled on the falling edge of RESET 


ms FLUSH#, BRDYC#—Sampled on the falling edge of RESET, 
also sampled on the rising edge of CLK 


= All inputs and outputs are sampled relative to TCK in 
Boundary-Scan Test Mode. Inputs are sampled on the rising 
edge of TCK, outputs are driven off of the falling edge of 
TCK. 


For each signal in the timing diagrams, the High level 
represents 1, the Low level represents 0, and the Middle level 
represents the floating (high-impedance) state. When both the 
High and Low levels are shown, the meaning depends on the 
signal. A single signal indicates ‘don’t care’. In the case of bus 
activity, if both High and Low levels are shown, it indicates the 
processor, alternate master, or system logic is driving a value, 
but this value may or may not be valid. (For example, the value 
on the address bus is valid only during the assertion of ADS#, 
but addresses are also driven on the bus at other times.) Figure 
53 defines the different waveform representations. 
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Waveform Description 


Don't care or bus is driven 


a; <a Signal or bus is changing from Low to High 


a Signal or bus is changing from High to Low 
X Bus is changing 


SD ee Bus is changing from valid to invalid 


2S Ss Signal or bus is floating 


——_S~—_ Denotes multiple clock periods 


Figure 53. Waveform Definitions 


For all active-High signals, the term asserted means the signal is 
in the High-voltage state and the term negated means the signal 
is in the Low-voltage state. For all active-Low signals, the term 
asserted means the signal is in the Low-voltage state and the 
term negated means the signal is in the High-voltage state. 
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5.2 Bus State Machine Diagram 


[| Bus State 
<> Branch Condition 














Pending 
Request? 


Yes 





Data-NA# 


Data-NA# 
< Requested 






Pending 
Request? 





Pipe YS 
Pipeline 
Address 

Pipe-D 
Pipeline 

Data 

Trans 


Note: The processor transitions to the IDLE state on the clock edge on which BOFF4# or RESET is sampled asserted. 





No 





Bus Transition? 





Figure 54. Bus State Machine Diagram 
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Idle 


Address 


Data 


Data-NA# Requested 


Pipeline Address 


The processor does not drive the system bus in the Idle state 
and remains in this state until a new bus cycle is requested. The 
processor enters this state off the clock edge on which the last 
BRDY# of a cycle is sampled asserted during the following 
conditions: 


m The processor is in the Data state 


mu The processor is in the Data-NA# Requested state and no 
internal pending cycle is requested 


In addition, the processor is forced into this state when the 
system logic asserts RESET or BOFF#. The transition to this 
state occurs on the clock edge on which RESET or BOFF# is 
sampled asserted. 


In this state, the processor drives ADS# to indicate the 
beginning of a new bus cycle by validating the address and 
control signals. The processor remains in this state for one clock 
and unconditionally enters the Data state on the next clock 
edge. 


In the Data state, the processor drives the data bus during a 
write cycle or expects data to be returned during a read cycle. 
The processor remains in this state until either NA# or the last 
BRDY# is sampled asserted. If the last BRDY# is sampled 
asserted or both the last BRDY# and NA# are sampled asserted 
on the same clock edge, the processor enters the Idle state. If 
NA# is sampled asserted first, the processor enters the 
Data-NA# Requested state. 


If the processor samples NA# asserted while in the Data state 
and the current bus cycle is not completed (the last BRDY# is 
not sampled asserted), it enters the Data-NA# Requested state. 
The processor remains in this state until either the last BRDY# 
is sampled asserted or an internal pending cycle is requested. If 
the last BRDY# is sampled asserted before the processor drives 
a new bus cycle, the processor enters the Idle state (no internal 
pending cycle is requested) or the Address state (processor has 
a internal pending cycle). 


In this state, the processor drives ADS# to indicate the 
beginning of a new bus cycle by validating the address and 
control signals. In this state, the processor is still waiting for the 
current bus cycle to be completed (until the last BRDY# is 
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Pipeline Data 


Transition 
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sampled asserted). If the last BRDY# is not sampled asserted, 
the processor enters the Pipeline Data state. 


If the processor samples the last BRDY# asserted in this state, it 
determines if a bus transition is required between the current 
bus cycle and the pipelined bus cycle. A bus transition is 
required when the data bus direction changes between bus 
cycles, such as amemory write cycle followed by a memory read 
cycle. If a bus transition is required, the processor enters the 
Transition state for one clock to prevent data bus contention. If 
a bus transition is not required, the processor enters the Data 
state. 


The processor does not transition to the Data-NA# Requested 
state from the Pipeline Address state because the processor 
does not begin sampling NA# until it has exited the Pipeline 
Address state. 


Two bus cycles are concurrently executing in this state. The 
processor cannot issue any additional bus cycles until the 
current bus cycle is completed. The processor drives the data 
bus during write cycles or expects data to be returned during 
read cycles for the current bus cycle until the last BRDY# of the 
current bus cycle is sampled asserted. 


If the processor samples the last BRDY# asserted in this state, it 
determines if a bus transition is required between the current 
bus cycle and the pipelined bus cycle. If the bus transition is 
required, the processor enters the Transition state for one clock 
to prevent data bus contention. If a bus transition is not 
required, the processor enters the Data state (NA# was not 
sampled asserted) or the Data-NA# Requested state (NA# was 
sampled asserted). 


The processor enters this state for one clock during data bus 
transitions and enters the Data state on the next clock edge if 
NA# is not sampled asserted. The sole purpose of this state is to 
avoid bus contention caused by bus transitions during pipeline 
operation. 
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5.3 Memory Reads and Writes 


Single-Transfer 
Memory Read and 
Write 


The AMD-K6-2 processor performs single or burst memory bus 
cycles. The single-transfer memory bus cycle transfers 1, 2, 4, or 
8 bytes and requires a minimum of two clocks. Misaligned 
instructions or operands result in a split cycle, which requires 
multiple transactions on the bus. A burst cycle consists of four 
back-to-back 8-byte (64-bit) transfers on the data bus. 


Figure 55 shows a single-transfer read from memory, followed by 
two single-transfer writes to memory. For the memory read 
cycle, the processor asserts ADS# for one clock to validate the 
bus cycle and also drives A[31:3], BE[7:0]#, D/C#, W/R#, and 
M/IO# to the bus. The processor then waits for the system logic 
to return the data on D[63:0] (with DP[7:0] for parity checking) 
and assert BRDY#. The processor samples BRDY# on every clock 
edge starting with the clock edge after the clock edge that 
negates ADS#. See “BRDY# (Burst Ready)” on page 94. 


During the read cycle, the processor drives PCD, PWT, and 
CACHE# to indicate its caching and cache-coherency intent for 
the access. The system logic returns KEN# and WB/WT# to 
either confirm or change this intent. If the processor asserts 
PCD and negates CACHE#, the accesses are noncacheable, even 
though the system logic asserts KEN# during the BRDY# to 
indicate its support for cacheability. The processor (which 
drives CACHE#) and the system logic (which drives KEN#) must 
agree in order for an access to be cacheable. 


The processor can drive another cycle (in this example, a write 
cycle) by asserting ADS# off the next clock edge after BRDY# is 
sampled asserted. Therefore, an idle clock is guaranteed 
between any two bus cycles. The processor drives D[63:0] with 
valid data one clock edge after the clock edge on which ADS# is 
asserted. To minimize processor idle times, the system logic 
stores the address and data in write buffers, returns BRDY#, and 
performs the store to memory later. If the processor samples 
EWBE# negated during a write cycle, it suspends certain 
activities until EWBE# is sampled asserted. See “EWBE# 
(External Write Buffer Empty)” on page 101. In Figure 55, the 
second write cycle occurs during the execution of a serializing 
instruction. The processor delays the following cycle until 
EWBE# is sampled asserted. 
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Read Cycle Write Cycle Write Cycle (Next Cycle Delayed by EWBE#) 
ADDR DATA IDLE ADDR DATA DATA IDLE 0 DATA DATA IDLE IDLE IDLE IDLE IDLE | ADDR 


A(31:3] eee eee 


BE[7:0}4# Sc ee et as SA a a | 


in ee 
sh | 


: COPE 
ws tee 

a APE EEE 
D[63:0] —————__} 

oP i710] — pe 
erat wn 
KEN# 

_ SSS 


Figure 55. Non-Pipelined Single-Transfer Memory Read/Write and Write Delayed by EWBE# 
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Misaligned 
Single-Transfer 
Memory Read and 
Write 


Figure 56 shows a misaligned (split) memory read followed by a 
misaligned memory write. Any cycle that is not aligned as 
defined in “SCYC (Split Cycle)” on page 117 is considered 
misaligned. When the processor encounters a misaligned 
access, it determines the appropriate pair of bus cycles—each 
with its own ADS# and BRDY#— required to complete the 
access. 


The AMD-K6-2 processor performs misaligned memory reads 
and memory writes using least-significant bytes (LSBs) first 
followed by most-significant bytes (MSBs). Table 26 shows the 
order. In the first memory read cycle in Figure 56, the processor 
reads the least-significant bytes. Immediately after the 
processor samples BRDY# asserted, it drives the second bus 
cycle to read the most-significant bytes to complete the 
misaligned transfer. 


Table 26. Bus-Cycle Order During Misaligned Transfers 





Type of Access First Cycle Second Cycle 
Memory Read LSBs MSBs 
Memory Write LSBs MSBs 




















Similarly, the misaligned memory write cycle in Figure 56 on 
page 135 transfers the LSBs to the memory bus first. In the next 
cycle, after the processor samples BRDY# asserted, the MSBs 
are written to the memory bus. 
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| Memory Read (Misaligned) Memory Write (Misaligned) 
"ADDR DATA DATA IDLE ADDR DATA DATA IDLE ADDR DATA DATA DATA IDLE ADDR DATA DATA DATA IDLE 


ate ee | 
A | SS 6 gC 9 
BECO = IN oe ee 


W/R# | 
LSB MSB LSB MSB 
D(63:0] : 1 (}— =. 7 -— ——F) 
BRDY# ee ee ee 2 


Figure 56. Misaligned Single-Transfer Memory Read and Write 
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Burst Reads and 
Pipelined Burst Reads 


Figure 57 shows normal burst read cycles and a pipelined burst 
read cycle. The AMD-K6-2 processor drives CACHE# and ADS# 
together to specify that the current bus cycle is a burst cycle. If 
the processor samples KEN# asserted with the first BRDY#, it 
performs burst transfers. During the burst transfers, the system 
logic must ignore BE[7:0]# and must return all eight bytes 
beginning at the starting address the processor asserts on 
A[31:3]. Depending on the starting address, the system logic 
must determine the successive quadword addresses (A[4:3]) for 
each transfer in a burst, as shown in Table 27. The processor 
expects the second, third, and fourth quadwords to occur in the 
sequences shown in Table 27. 


Table 27. A[4:3] Address-Generation Sequence During Bursts 


























Address Driven By A[4:3] Addresses of Subsequent 
Processor on A[4:3] Quadwords* Generated By System Logic 
Quadword 1 Quadword 2 Quadword 3 Quadword 4 
00b Olb 10b 11b 
Olb 00b 11b 10b 
10b 11b 00b Olb 
11b 10b O1b 00b 
Note: 
* —quadword = 8 bytes 











In Figure 57, the processor drives CACHE# throughout all burst 
read cycles. In the first burst read cycle, the processor drives 
ADS# and CACHE#, then samples BRDY# on every clock edge 
starting with the clock edge after the clock edge that negates 
ADS#. The processor samples KEN# asserted on the clock edge 
on which the first BRDY# is sampled asserted, executes a 
32-byte burst read cycle, and expects a total of four BRDY# 
signals. An ideal no-wait state access is shown in Figure 57, 
whereas most system logic solutions add wait states between 
the transfers. 


The second burst read cycle illustrates a similar sequence, but 
the processor samples NA# asserted on the same clock edge 
that the first BRDY# is sampled asserted. NA# assertion 
indicates the system logic is requesting the processor to output 
the next address early (also known as a pipeline transfer 
request). Without waiting for the current cycle to complete, the 
processor drives ADS# and related signals for the next burst 
cycle. Pipelining can reduce processor cycle-to-cycle idle times. 
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Burst Read | Burst Read Pipelined Burst Read 
ADDR DATA DATA DATA DATA IDLE ADDR DATA DATA DATA PIPE nity Data DATA DATA IDLE 








-NA_ -ADDR 





= 2 


Figure 57. Burst Reads and Pipelined Burst Reads 







D[63:0] 
CACHE#(___\ 
KEN# 
BRDY# 
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Burst Writeback 


Figure 58 shows a burst read followed by a writeback 
transaction. The AMD-K6-2 processor initiates writebacks 
under the following conditions: 


ms Replacement—lIf a cache-line fill is initiated for a cache line 
currently filled with valid entries, the processor selects a 
line for replacement based on a least-recently-used (LRU) 
algorithm for the instruction cache, and a 
least-recently-allocated (LRA) algorithm for the data cache. 
Before a replacement is made to a L1 data cache line that is 
in the modified state, the modified line is scheduled to be 
written back to memory. 


a Internal Snoop—The processor snoops its instruction cache 
during read or write misses to its data cache, and it snoops 
its data cache during read misses to its instruction cache. 
This snooping is performed to determine whether the same 
address is stored in both caches, a situation that is taken to 
imply the occurrence of self-modifying code. If a snoop hits a 
data cache line in the modified state, the line is written back 
to memory before being invalidated. 


ms WBINVD Instruction—When the processor executes a 
WBINVD instruction, it writes back all modified lines in the 
data cache and then invalidates all lines in both caches. 


m Cache Flush—When the processor samples FLUSH# 
asserted, it executes a flush acknowledge special cycle and 
writes back all modified lines in the data cache and then 
invalidates all lines in both caches. 


The processor drives writeback cycles during inquire or cache 
flush cycles. The writeback shown in Figure 58 is caused by a 
cache-line replacement. The processor completes the burst read 
cycle that fills the cache line. Immediately following the burst 
read cycle is the burst writeback cycle that represents the 
modified line to be written back to memory. D[63:0] are driven 
one clock edge after the clock edge on which ADS# is asserted 
and are subsequently changed off the clock edge on which each 
of the four BRDY# signals of the burst cycle are sampled 
asserted. 
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Burst Read Burst Writeback from L1 Cache 
ADDR DATA DATA DATA DATA IDLE ADDR DATA DATA DATA DATA IDLE 





BE[7:0}#( 
ADS# 
CACHE# 
M/lO# 
D/C# 
W/Rd# 
D[63:0] 
KEN# 
BRDY# 


WB/WT# 


Figure 58. Burst Writeback due to Cache-Line Replacement 





Chapter 5 Bus Cycles 139 


AMD<¢\ 


Preliminary Information 





AMD-K6®-2 Processor Data Sheet 21850J/0—February 2000 


5.4 I/O Read and Write 


Basic I/O Read and 
Write 


CLK 


ADDR 


The processor accesses I/O when it executes an J/O instruction 
(for example, IN or OUT). Figure 59 shows an I/O read followed 
by an I/O write. The processor drives M/IO# Low and D/C# High 
during I/O cycles. In this example, the first cycle shows a single 
wait state I/O read cycle. It follows the same sequence asa 
single-transfer memory read cycle. The processor drives ADS# 
to initiate the bus cycle, then it samples BRDY# on every clock 
edge starting with the clock edge after the clock edge that 
negates ADS#. The system logic must return BRDY# to 
complete the cycle. When the processor samples BRDY# 
asserted, it can assert ADS# for the next cycle off the next clock 
edge. (In this example, an I/O write cycle.) 


The I/O write cycle is similar to a memory write cycle, but the 
processor drives M/IO# low during an I/O write cycle. The 
processor asserts ADS# to initiate the bus cycle. The processor 
drives D[63:0] with valid data one clock edge after the clock 
edge on which ADS# is asserted. The system logic must assert 
BRDY# when the data is properly stored to the I/O destination. 
The processor samples BRDY# on every clock edge starting with 
the clock edge after the clock edge that negates ADS#. In this 
example, two wait states are inserted while the processor waits 
for BRDY# to be asserted. 


I/O Read Cycle I/O Write Cycle 
DATA DATA IDLE ADDR DATA DATA DATA IDLE 


a a Cy a 2 
BE[7:0}##L__ 


ADS# 
M/lO# 
D/C# 
W/R# 
D[63:0] 
BRDY# 


Figure 59. Basic I/O Read and Write 
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Misaligned I/O Read Table 28 shows the misaligned I/O read and write cycle order 

and Write executed by the AMD-K6-2 processor. In Figure 60, the 
least-significant bytes (LSBs) are transferred first. Immediately 
after the processor samples BRDY# asserted, it drives the 
second bus cycle to transfer the most-significant bytes (MSBs) 
to complete the misaligned bus cycle. 


Table 28. Bus-Cycle Order During Misaligned I/O Transfers 











Type of Access First Cycle Second Cycle 
/O Read LSBs MSBs 
/O Write LSBs MSBs 














Misaligned I/O Read 
ADDR DATA DATA. IDLE ADDR DATA DATA IDLE 


Misaligned I/O Write 
ADDR DATA.DATA DATA. IDLE ADDR DATA DATA DATA. IDLE 















Figure 60. Misaligned I/O Transfer 
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5.5 Inquire and Bus Arbitration Cycles 


Hold and Hold 
Acknowledge Cycle 


The AMD-K6-2 processor provides built-in level-one data and 
instruction caches. Each cache is 32 Kbytes and two-way 
set-associative. The system logic or other bus master devices 
can initiate an inquire cycle to maintain cache/memory 
coherency. In response to the inquire cycle, the processor 
compares the inquire address with its cache tag addresses in 
both caches, and, if necessary, updates the MESI state of the 
cache line and performs writebacks to memory. 


An inquire cycle can be initiated by asserting AHOLD, BOFF#, 
or HOLD. AHOLD is exclusively used to support inquire cycles. 
During AHOLD-initiated inquire cycles, the processor only 
floats the address bus. BOFF# provides the fastest access to the 
bus because it aborts any processor cycle that is in-progress, 
whereas AHOLD and HOLD both permit an in-progress bus 
cycle to complete. During HOLD-initiated and BOFF#-initiated 
inquire cycles, the processor floats all of its bus-driving signals. 


The system logic or another bus device can assert HOLD to 
initiate an inquire cycle or to gain full control of the bus. When 
the AMD-K6-2 processor samples HOLD asserted, it completes 
any in-progress bus cycle and asserts HLDA to acknowledge 
release of the bus. The processor floats the following signals off 
the same clock edge that HLDA is asserted: 

A[31:3] DP[7:0] 

ADS# LOCK# 

AP# M/IO# 

BE[7:0]# PCD 

CACHE# PWT 

D[63:0] SCYC 

D/C# W/R# 


Figure 61 shows a basic HOLD/HLDA operation. In this 
example, the processor samples HOLD asserted during the 
memory read cycle. It continues the current memory read cycle 
until BRDY# is sampled asserted. The processor drives HLDA 
and floats its outputs one clock edge after the last BRDY# of the 
cycle is sampled asserted. The system logic can assert HOLD for 
as long as it needs to utilize the bus. The processor samples 
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HOLD on every clock edge but does not assert HLDA until any 
in-progress cycle or sequence of locked cycles is completed. 


When the processor samples HOLD negated during a hold 
acknowledge cycle, it negates HLDA off the next clock edge. 
The processor regains control of the bus and can assert ADS# 
off the same clock edge on which HLDA is negated. 


ABI3] FC >) call Gala ee Ga Da 
BE Oe eS ee §------- a a aes 
ADS RS ee | §------- Brea te 
Cn A id G---+--- 2 
DC ns ar rl ee ee ee G-—-+-L- Ys on Ts Te 
WR \ pan §—------- — 


Figure 61. Basic HOLD/HLDA Operation 
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HOLD-Initiated Figure 62 shows a HOLD-initiated inquire cycle. In this 
Inquire Hit to Shared example, the processor samples HOLD asserted during the 


or Exclusive Line 


burst memory read cycle. The processor completes the current 
cycle (until the last expected BRDY# is sampled asserted), 
asserts HLDA and floats its outputs as described on page 142. 


The system logic drives an inquire cycle within the hold 
acknowledge cycle. It asserts EADS#, which validates the 
inquire address on A[31:5]. If EADS# is sampled asserted 
before HOLD is sampled negated, the processor recognizes it as 
a valid inquire cycle. 


In Figure 62, the processor asserts HIT# and negates HITM# on 
the clock edge after the clock edge on which EADS# is sampled 
asserted, indicating the current inquire cycle hit a shared or 
exclusive cache line. (Shared and exclusive cache lines have not 
been modified and do not need to be written back.) During an 
inquire cycle, the processor samples INV to determine whether 
the addressed cache line found in the processor’s instruction or 
data cache transitions to the invalid state or the shared state. In 
this example, the processor samples INV asserted with EADS#, 
which invalidates the cache line. 


The system logic can negate HOLD off the same clock edge on 
which EADS# is sampled asserted. The processor continues 
driving HIT# in the same state until the next inquire cycle. 
HITM# is not asserted unless HIT# is asserted. 
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| Burst Memory Read | Inquire 


CLK 

A(31:3] 
BE[7:0]+# 
ADS# 





Figure 62. HOLD-Initiated Inquire Hit to Shared or Exclusive Line 
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HOLD-Initiated 
Inquire Hit to 
Modified Line 


Figure 63 shows the same sequence as Figure 62, but in Figure 
63 the inquire cycle hits a modified line and the processor 
asserts both HIT# and HITM#. In this example, the processor 
performs a writeback cycle immediately after the inquire cycle. 
It updates the modified cache line to external memory 
(normally, external cache or DRAM). The processor uses the 
address (A[31:5]) that was latched during the inquire cycle to 
perform the writeback cycle. The processor asserts HITM# 
throughout the writeback cycle and negates HITM# one clock 
edge after the last expected BRDY# of the writeback is sampled 
asserted. 


When the processor samples EADS# during the inquire cycle, it 
also samples INV to determine the cache line MESI state after 
the inquire cycle. If INV is sampled asserted during an inquire 
cycle, the processor transitions the line (if found) to the invalid 
state, regardless of its previous state. The cache line 
invalidation operation is not visible on the bus. If INV is 
sampled negated during an inquire cycle, the processor 
transitions the line (if found) to the shared state. In Figure 63 
the processor samples INV asserted during the inquire cycle. 


In a HOLD-initiated inquire cycle, the system logic can negate 
HOLD off the same clock edge on which EADS# is sampled 
asserted. The processor drives HIT# and HITM# on the clock 
edge after the clock edge on which EADS# is sampled asserted. 
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Burst Memory Read Inquire Writeback Cycle 


CLK 


ADS# 
M/lO# 
D/C# 
W/R# 
HIT# 
HITMH# 
D[63:0] 
KEN# 
BRDY# 
HOLD 
HLDA 


EADS# 
INV 





Figure 63. HOLD-Initiated Inquire Hit to Modified Line 
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AHOLD-Initiated 
Inquire Miss 


AHOLD can be asserted by the system to initiate one or more 
inquire cycles. To allow the system to drive the address bus 
during an inquire cycle, the processor floats A[31:3] and AP off 
the clock edge on which AHOLD is sampled asserted. The data 
bus and all other control and status signals remain under the 
control of the processor and are not floated. This functionality 
allows a bus cycle in progress when AHOLD is sampled asserted 
to continue to completion. The processor resumes driving the 
address bus off the clock edge on which AHOLD is sampled 
negated. 


In Figure 64, the processor samples AHOLD asserted during the 
memory burst read cycle, and it floats the address bus off the 
same clock edge on which it samples AHOLD asserted. While 
the processor still controls the bus, it completes the current 
cycle until the last expected BRDY# is sampled asserted. The 
system logic drives EADS# with an inquire address on A[31:5] 
during an inquire cycle. The processor samples EADS# asserted 
and compares the inquire address to its tag address in both the 
instruction and data caches. In Figure 64, the inquire address 
misses the tag address in the processor (both HIT# and HITM# 
are negated). Therefore, the processor proceeds to the next 
cycle when it samples AHOLD negated. (The processor can 
drive anew cycle by asserting ADS# off the same clock edge that 
it samples AHOLD negated.) 


For an AHOLD-initiated inquire cycle to be recognized, the 
processor must sample AHOLD asserted for at least two 
consecutive clocks before it samples EADS# asserted. If the 
processor detects an address parity error during an inquire 
cycle, APCHK# is asserted for one clock. The system logic must 
respond appropriately to the assertion of this signal. 
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Inquire 





APCHK# | =7, 
ADS# | 
HIT 
HITM | 
[63.0] : fr ee 
KEN | 
BRDY# 


se i ee ce coe 
EADS# | 


INV 


Figure 64. AHOLD-Initiated Inquire Miss 
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AHOLD-Initiated In Figure 65, the processor asserts HIT# and negates HITM# off 
Inquire Hit to Shared = the clock edge after the clock edge on which EADS# is sampled 


or Exclusive Line 


asserted, indicating the current inquire cycle hits either a 
shared or exclusive line. (HIT# is driven in the same state until 
the next inquire cycle.) The processor samples INV asserted 
during the inquire cycle and transitions the line to the invalid 
state regardless of its previous state. 


During an AHOLD-initiated inquire cycle, the processor 
samples AHOLD on every clock edge until it is negated. In 
Figure 65, the processor asserts ADS# off the same clock on 
which AHOLD is sampled negated. If the inquire cycle hits a 
modified line, the processor performs a writeback cycle before 
it drives a new bus cycle. The next section describes the 
AHOLD-initiated inquire cycle that hits a modified line. 
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Burst Memory Read Inquire 


CLK , 
A313] oe eases ———- 


BE[7:0}4# SS 
ADS# 
M/lO# 
D/C# 
W/Rd# 
HIT# 
HITM4 
D[63:0] 
KEN# 
BRDY# 
AHOLD 
EADS# 
INV 





Figure 65. AHOLD-Initiated Inquire Hit to Shared or Exclusive Line 
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AHOLD-Initiated 
Inquire Hit to 
Modified Line 


Figure 66 shows an AHOLD-initiated inquire cycle that hits a 
modified line. During the inquire cycle in this example, the 
processor asserts both HIT# and HITM# on the clock edge after 
the clock edge that it samples EADS# asserted. This condition 
indicates that the cache line exists in the processor’s data cache 
in the modified state. 


If the inquire cycle hits a modified line, the processor performs 
a writeback cycle immediately after the inquire cycle to update 
the modified cache line to shared memory (normally external 
cache or DRAM). In Figure 66, the system logic holds AHOLD 
asserted throughout the inquire cycle and the processor 
writeback cycle. In this case, the processor is not driving the 
address bus during the writeback cycle because AHOLD is 
sampled asserted. The system logic writes the data to memory 
by using its latched copy of the inquire cycle address. If the 
processor samples AHOLD negated before it performs the 
writeback cycle, it drives the writeback cycle by using the 
address (A[31:5]) that it latched during the inquire cycle. 


If INV is sampled asserted during an inquire cycle, the 
processor transitions the line (if found) to the invalid state, 
regardless of its previous state (the cache invalidation 
operation is not visible on the bus). If INV is sampled negated 
during an inquire cycle, the processor transitions the line (if 
found) to the shared state. In either case, if the line is found in 
the modified state, the processor writes it back to memory 
before changing its state. Figure 66 shows that the processor 
samples INV asserted during the inquire cycle and invalidates 
the cache line after the inquire cycle. 
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Burst Memory Read Inquire Writeback 
CLK 
Agi3] xX} - - 1 - HH HH HK 


BE|7:0]## Gane Ao et Gi ee Ge eee a | 


ean 4 
mt 
ets Wi ii : 


D[63:0] ci (ie i Cie TT ae) GS ES GE 


oe 
BRDY# 


sian cen ces Fa 
“RES ee Iae 
INV 


Figure 66. AHOLD-Initiated Inquire Hit to Modified Line 
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AHOLD Restriction 


When the system logic drives an AHOLD-initiated inquire 
cycle, it must assert AHOLD for at least two clocks before it 
asserts EADS#. This requirement guarantees the processor 
recognizes and responds to the inquire cycle properly. The 
processor’s 32 address bus drivers turn on almost immediately 
after AHOLD is sampled negated. If the processor switches the 
data bus (D[63:0] and DP[7:0]) during a write cycle off the same 
clock edge that switches the address bus (A[31:3] and AP), the 
processor switches 102 drivers simultaneously, which can lead 
to ground-bounce spikes. Therefore, before negating AHOLD 
the following restrictions must be observed by the system logic: 


ms When the system logic negates AHOLD during a write cycle, 
it must ensure that AHOLD is not sampled negated on the 
clock edge on which BRDY# is sampled asserted (See Figure 
67). 


m When the system logic negates AHOLD during a writeback 
cycle, it must ensure that AHOLD is not sampled negated on 
the clock edge on which ADS# is negated (See Figure 67). 


m When a write cycle is pipelined into a read cycle, AHOLD 
must not be sampled negated on the clock edge after the 
clock edge on which the last BRDY# of the read cycle is 
sampled asserted to avoid the processor simultaneously 
driving the data bus (for the pending write cycle) and the 
address bus off this same clock edge. 
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The system must ensure that AHOLD Is not sampled negated on the clock edge on which BRDY4# is sample 
asserted. 


Figure 67. AHOLD Restriction 
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Bus Backoff (BOFF#) 


BOFF# provides the fastest response among bus-hold inputs. 
Either the system logic or another bus master can assert BOFF# 
to gain control of the bus immediately. BOFF# is also used to 
resolve potential deadlock problems that arise as a result of 
inquire cycles. The processor samples BOFF# on every clock 
edge. If BOFF# is sampled asserted, the processor 
unconditionally aborts any cycles in progress and transitions to 
a bus hold state. (See “BOFF# (Backoff)” on page 93.) Figure 68 
shows a read cycle that is aborted when the processor samples 
BOFF# asserted even though BRDY# is sampled asserted on the 
same clock edge. The read cycle is restarted after BOFF# is 
sampled negated (KEN# must be in the same state during the 
restarted cycle as its state during the aborted cycle). 


During a BOFF#-initiated inquire cycle that hits a shared or 
exclusive line, the processor samples BOFF# negated and 
restarts any bus cycle that was aborted when BOFF# was 
asserted. If a BOFF#-initiated inquire cycle hits a modified line, 
the processor performs a writeback cycle before it restarts the 
aborted cycle. 


If the processor samples BOFF# asserted on the same clock 
edge that it asserts ADS#, ADS# is floated but the system logic 
may erroneously interpret ADS# as asserted. In this case, the 
system logic must properly interpret the state of ADS# when 
BOFF# is negated. 
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Figure 68. BOFF# Timing 
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Locked Cycles The processor asserts LOCK# during a sequence of bus cycles to 


Basic Locked 
Operation 


ensure the cycles are completed without allowing other bus 
masters to intervene. Locked operations can consist of two to 
five cycles. LOCK# is asserted during the following operations: 


An interrupt acknowledge sequence 
Descriptor Table accesses 

Page Directory and Page Table accesses 
XCHG instruction 

An instruction with an allowable LOCK prefix 


In order to ensure that locked operations appear on the bus and 
are visible to the entire system, any data operands addressed 
during a locked cycle that reside in the processor’s cache are 
flushed and invalidated from the cache prior to the locked 
operation. If the cache line is in the modified state, it is written 
back and invalidated prior to the locked operation. Likewise, 
any data read during a locked operation is not cached. The 
processor negates LOCK# for at least one clock between 
consecutive sequences of locked operations to allow the system 
logic to arbitrate for the bus. 


The processor asserts SCYC during misaligned locked transfers 
on the D[63:0] data bus. The processor generates additional bus 
cycles to complete the transfer of misaligned data. 


Figure 69 shows a pair of read-write bus cycles. It represents a 
typical read-modify-write locked operation. The processor 
asserts LOCK# off the same clock edge that it asserts ADS# of 
the first bus cycle in the locked operation and holds it asserted 
until the last expected BRDY# of the last bus cycle in the locked 
operation is sampled asserted. (The processor negates LOCK# 
off the same clock edge.) 
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Locked Read Cycle | Locked Write Cycle | 
ADDR DATA DATA DATA IDLE IDLE ADDR DATA DATA DATA IDLE IDLE ADDR 








Figure 69. Basic Locked Operation 
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Locked Operation 
with BOFF# 
Intervention 


Figure 70 shows BOFF# asserted within a locked read-write pair 
of bus cycles. In this example, the processor asserts LOCK# 
with ADS# to drive a locked memory read cycle followed by a 
locked memory write cycle. During the locked memory write 
cycle in this example, the processor samples BOFF# asserted. 
The processor immediately aborts the locked memory write 
cycle and floats all its bus-driving signals, including LOCK#. 
The system logic or another bus master can initiate an inquire 
cycle or drive a new bus cycle one clock edge after the clock 
edge on which BOFF# is sampled asserted. If the system logic 
drives a BOFF#-initiated inquire cycle and hits a modified line, 
the processor performs a writeback cycle before it restarts the 
locked cycle (the processor asserts LOCK# during the 
writeback cycle). 


In Figure 70, the processor immediately restarts the aborted 
locked write cycle by driving the bus off the clock edge on 
which BOFF# is sampled negated. The system logic must ensure 
the processor results for interrupted and uninterrupted locked 
cycles are consistent. That is, the system logic must guarantee 
the memory accessed by the processor is not modified during 
the time another bus master controls the bus. 
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Figure 70. Locked Operation with BOFF# Intervention 
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Interrupt 
Acknowledge 


In response to recognizing the system’s maskable interrupt 
(INTR), the processor drives an interrupt acknowledge cycle at 
the next instruction boundary. During an interrupt 
acknowledge cycle, the processor drives a locked pair of read 
cycles as shown in Figure 71. The first read cycle is not 
functional, and the second read cycle returns the interrupt 
number on D[7:0] (OOh-FFh). Table 29 shows the state of the 
signals during an interrupt acknowledge cycle. 


Table 29. Interrupt Acknowledge Operation Definition 





Processor Outputs | First Bus Cycle Second Bus Cycle 
D/C# Low Low 
M/lO# Low Low 
W/R# Low Low 
BE[7:0]## EFh FEh (low byte enabled) 
A[31:3] 0000_0000h 0000_0000h 


Interrupt number expected from interrupt 
controller on D[7:0] 

















D[63:0] (ignored) 

















The system logic can drive INTR either synchronously or 
asynchronously. If it is asserted asynchronously, it must be 
asserted for a minimum pulse width of two clocks. To ensure it 
is recognized, INTR must remain asserted until an interrupt 
acknowledge sequence is complete. 





162 


Bus Cycles Chapter 5 


Preliminary Information AMD 





21850)/0—February 2000 AMD-K6®-2 Processor Data Sheet 


Interrupt Acknowledge Cycles 
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Figure 71. Interrupt Acknowledge Operation 
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5.6 Special Bus Cycles 


Basic Special Bus 
Cycle 


The AMD-K6-2 processor drives special bus cycles that include 
stop grant, flush acknowledge, cache writeback invalidation, 
halt, cache invalidation, and shutdown cycles. During all 
special cycles, D/C# = 0, M/IO# = 0, and W/R# = 1. BE[7:0]# and 
A[31:3] are driven to differentiate among the special cycles, as 
shown in Table 30. The system logic must return BRDY# in 
response to all processor special cycles. 


Table 30. Encodings For Special Bus Cycles 





























BE[7:0]# | A[4:3]* Special Bus Cycle Cause 
| FBh | 0b |  StopGrant — | STPCLK# sampled asserted | 

EFh 00b Flush Acknowledge | FLUSH# sampled asserted 
F7h 00b Writeback WBINVD instruction 
FBh 00b Halt HLT instruction 
FDh 00b Flush INVD,WBINVD instruction 
FEh 00b Shutdown Triple fault 

Note: 
* A[31:5] =0 











Figure 72 shows a basic special bus cycle. The processor drives 
D/C# = 0, M/IO# = 0, and W/R# = 1 off the same clock edge that 
it asserts ADS#. In this example, BE[7:0]# = FBh and A[31:3] = 
0000_0000h, which indicates that the special cycle is a halt 
special cycle (See Table 30). A halt special cycle is generated 
after the processor executes the HLT instruction. 


If the processor samples FLUSH# asserted, it writes back any 
data cache lines that are in the modified state and invalidates 
all lines in the instruction and data cache. The processor then 
drives a flush acknowledge special cycle. 


If the processor executes a WBINVD instruction, it drives a 
writeback special cycle after the processor completes 
invalidating and writing back the cache lines. 
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Figure 72. Basic Special Bus Cycle (Halt Cycle) 
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Shutdown Cycle In Figure 73, a shutdown (triple fault) occurs in the first half of 
the waveform, and a shutdown special cycle follows in the 
second half. The processor enters shutdown when an interrupt 
or exception occurs during the handling of a double fault (INT 
8), which amounts to a triple fault. When the processor 
encounters a triple fault, it stops its activity on the bus and 
generates the shutdown special bus cycle (BE[7:0]# = FEh). 


The system logic must assert NMI, INIT, RESET, or SMI# to get 
the processor out of the shutdown state. 


Shutdown Occurs Shutdown Special Cycle 
(Triple Fault) 


ADS# 
LOCK# 
M/lO# 
D/C# 
W/R# 
D[63:0] a = — 1) 
KEN# 
BRDY# 


Figure 73. Shutdown Cycle 
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Stop Grant and Stop 
Clock States 


AMD-K6®-2 Processor Data Sheet 


Figure 74 and Figure 75 show the processor transition from 
normal execution to the Stop Grant state, then to the Stop 
Clock state, back to the Stop Grant state, and finally back to 
normal execution. The series of transitions begins when the 
processor samples STPCLK# asserted. On recognizing a 
STPCLK# interrupt at the next instruction retirement 
boundary, the processor performs the following actions, in the 
order shown: 


1. Its instruction pipelines are flushed 
2. All pending and in-progress bus cycles are completed 


3. The STPCLK# assertion is acknowledged by executing a 
Stop Grant special bus cycle 


4. Its internal clock is stopped after BRDY# of the Stop Grant 
special bus cycle is sampled asserted and after EWBE# is 
sampled asserted (if EWBE# is masked off, then entry into 
the Stop Grant state is not affected by EWBE#) 


5. The Stop Clock state is entered if the system logic stops the 
bus clock CLK (optional) 


STPCLK# is sampled as a level-sensitive input on every clock 
edge but is not recognized until the next instruction boundary. 
The system logic drives the signal either synchronously or 
asynchronously. If it is asserted asynchronously, it must be 
asserted for a minimum pulse width of two clocks. STPCLK# 
must remain asserted until recognized, which is indicated by 
the completion of the Stop Grant special cycle. 
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CLK 
A[31:3] 


ADS# 
M/lO# 
D/C# 
W/R# 
CACHE# 
STPCLK# 
D{63:0] 
KEN# 
BRDY# 


STPCLK# Sampled Asserted 
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Figure 74. Stop Grant and Stop Clock Modes, Part 1 
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Stop Clock Stop Grant State STPCLK# Sampled Negated Normal 
(Re-entered after PLL stabilization) 


CLK § 
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Figure 75. Stop Grant and Stop Clock Modes, Part 2 
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INIT-Initiated 
Transition from 
Protected Mode to 
Real Mode 


INIT is typically asserted in response to a BIOS interrupt that 
writes to an I/O port. This interrupt is often in response toa 
Ctrl-Alt-Del keyboard input. The BIOS writes to a port (similar 
to port 64h in the keyboard controller) that asserts INIT. INIT is 
also used to support 80286 software that must return to Real 
mode after accessing extended memory in Protected mode. 


The assertion of INIT causes the processor to empty its 
pipelines, initialize most of its internal state, and branch to 
address FFFF_FFFOh—the same instruction execution starting 
point used after RESET. Unlike RESET, the processor 
preserves the contents of its caches, the floating-point state, the 
MMxX state, Model-Specific Registers (MSRs), the CD and NW 
bits of the CRO register, the time stamp counter, and other 
specific internal resources. 


Figure 76 shows an example in which the operating system 
writes to an I/O port, causing the system logic to assert INIT. 
The sampling of INIT asserted starts an extended microcode 
sequence that terminates with a code fetch from FFFF_FFFOh, 
the reset location. INIT is sampled on every clock edge but is 
not recognized until the next instruction boundary. During an 
I/O write cycle, it must be sampled asserted a minimum of three 
clock edges before BRDY# is sampled asserted if it is to be 
recognized on the boundary between the I/O write instruction 
and the following instruction. If INIT is asserted synchronously, 
it can be asserted for a minimum of one clock. If it is asserted 
asynchronously, it must have been negated for a minimum of 
two clocks, followed by an assertion of a minimum of two clocks. 
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INIT Sampled Asserted Code Fetch 


CLK 
A(31:3] 
BE[7:0]4# 
ADS# 
M/lO# 
D/C# 


FFFF_FFFOh 


W/R# 
D[63:0] 
KEN# 


BRDY# 
INIT 





Figure 76. INIT-Initiated Transition from Protected Mode to Real Mode 
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Power-on Configuration and Initialization 





On power-on the system logic must reset the AMD-K6-2 
processor by asserting the RESET signal. When the processor 
samples RESET asserted, it immediately flushes and initializes 
all internal resources and its internal state, including its 
pipelines and caches, the floating-point state, the MMX and 
3DNow! states, and all registers. Then the processor jumps to 
address FFFF_FFFOh to start instruction execution. 


Signals Sampled During the Falling Transition of RESET 


FLUSH# 


BF[2:0] 


BRDYC# 


FLUSH# is sampled on the falling transition of RESET to 
determine if the processor begins normal instruction execution 
or enters Tri-State Test mode. If FLUSH# is High during the 
falling transition of RESET, the processor unconditionally runs 
its Built-In Self Test (BIST), performs the normal reset 
functions, then jumps to address FFFF_FFFOh to start 
instruction execution. (See “Built-In Self-Test (BIST)” on page 
221 for more details.) If FLUSH# is Low during the falling 
transition of RESET, the processor enters Tri-State Test mode. 
(See “Tri-State Test Mode” on page 222 and “FLUSH# (Cache 
Flush)” on page 103 for more details.) 


The internal operating frequency of the processor is 
determined by the state of the bus frequency signals BF[2:0] 
when they are sampled during the falling transition of RESET. 
The frequency of the CLK input signal is multiplied internally 
by a ratio defined by BF[2:0]. (See “BF[2:0] (Bus Frequency)” 
on page 92 for the processor-clock to bus-clock ratios.) 


BRDYC# is sampled on the falling transition of RESET to 
configure the drive strength of A[20:3], ADS#, HITM#, and 
W/R#. If BRDYC# is Low during the fall of RESET, these 
outputs are configured using higher drive strengths than the 
standard strength. If BRDYC# is High during the fall of RESET, 
the standard strength is selected. (See “BRDYC# (Burst Ready 
Copy)” on page 95 for more details.) 
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6.2 RESET Requirements 
During the initial power-on reset of the processor, RESET must 
remain asserted for a minimum of 1.0 ms after CLK and Vcc 
reach specification. (See “CLK Switching Characteristics” on 
page 267 for clock specifications. See “Electrical Data” on page 
253 for Vcc specifications.) 
During a warm reset while CLK and Vcc are within 
specification, RESET must remain asserted for a minimum of 
15 clocks prior to its negation. 
6.3 State of Processor After RESET 
Output Signals Table 31 shows the state of all processor outputs and 
bidirectional signals immediately after RESET is sampled 
asserted. 
Table 31. Output Signal State After RESET 
Signal State Signal State 
A[31:3], AP Floating LOCK# High 
ADS#, ADSC# High M/lO# Low 
APCHK# High PCD Low 
BE[7:0]# Floating PCHK# High 
BREQ Low PWT Low 
CACHE# High SCYC Low 
D/C# Low SMIACT# High 
D[63:0], DP[7:0] Floating TDO Floating 
FERR# High VCC2DET Low 
HIT# High VCC2H/L# Low 
HITM# High W/R# Low 
HLDA Low - - 
Registers Table 32 on page 175 shows the state of all architecture 
registers and Model-Specific Registers (MSRs) after the 
processor has completed its initialization due to the recognition 
of the assertion of RESET. 
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Table 32. Register State After RESET 




























































































Register State (hex) Notes 
GDTR base:0000_0000h limit:OFFFFh 
IDTR base:0000_0000h limit:OFFFFh 
TR 0000h 
LDTR 0000h 
EIP FFFF_FFFOh 
EFLAGS 0000_0002h 
EAX 0000_0000h 1 
EBX 0000_0000h 
ECX 0000_0000h 
EDX 0000_058Xh 2 
ESI 0000_0000h 
EDI 0000_0000h 
EBP 0000_0000h 
ESP 0000_0000h 
cs FO00h 
SS 0000h 
DS 0000h 
ES 0000h 
FS 0000h 
GS 0000h 
FPU Stack R7-RO 0000_0000_0000_0000_0000h 3 
FPU Control Word 0040h 3 
FPU Status Word 0000h 3 
FPU Tag Word 5555h 3 
FPU Instruction Pointer 0000_0000_0000h 3 
FPU Data Pointer 0000_0000_0000h 3 
FPU Opcode Register 000_0000_0000b 3 
Notes: 
1. The contents of EAX indicate if BIST was successful. If EAX =0000_0000h, BIST was successful. 
If EAX is non-zero, BIST failed. 
2. EDX contains the AMD-K6-2 processor signature, where X indicates the processor Stepping ID. 
3. The contents of these registers are preserved following the recognition of INIT. 
4. The CD and NW bits of CRO are preserved following the recognition of INIT. 
5. UWCCR, PSOR, and PFIR are implemented only on AMD-K6-2 processor Model 8/[F-8]. 
6. “S” represents the Stepping. “B”’ represents PSOR/[3:0], where PSOR/3] equals 0, and 
pee is equal to the value of the BF[2:0] signals sampled during the falling transition of 
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Table 32. Register State After RESET (continued) 




































































Register State (hex) Notes 
CRO 6000_0010h 4 
CR2 0000_0000h 
CR3 0000_0000h 
CR4 0000_0000h 
DR7 0000_0400h 
DR6 FFFF_OFFOh 
DR3 0000_0000h 
DR2 0000_0000h 
DRI 0000_0000h 
DRO 0000_0000h 
MCAR 0000_0000_0000_0000h 3 
MCTR 0000_0000_0000_0000h 3 
TRI12 0000_0000_0000_0000h 3 
TSC 0000_0000_0000_0000h 3 
EFER 0000_0000_0000_0000h (Model 8/[7:0}) . 
0000_0000_0000_0002h (Model 8/[F:8]) 
STAR 0000_0000_0000_0000h 
WHCR 0000_0000_0000_0000h 3 
UWCCR 0000_0000_0000_0000h 3,5 
PSOR 0000_0000_0000_01SBh 3, 5,6 
PFIR 0000_0000_0000_0000h 3,5 
Notes: 
1. The contents of EAX indicate if BIST was successful. If EAX =0000_0000h, BIST was successful. 
If EAX is non-zero, BIST failed. 
2. EDX contains the AMD-K6-2 processor signature, where X indicates the processor Stepping ID. 
3. The contents of these registers are preserved following the recognition of INIT. 
4. The CD and NW bits of CRO are preserved following the recognition of INIT. 
5. UWCCR, PSOR, and PFIR are implemented only on AMD-K6-2 processor Model 8/[F-8]. 
6. “S” represents the saa “B” represents PSOR[3:0], where PSOR[3] equals 0, and 
PSOR[2:0] is equal to the value of the BF[2:0] signals sampled during the falling transition of 
RESET. 
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State of Processor After INIT 


The recognition of the assertion of INIT causes the processor to 
empty its pipelines, to initialize most of its internal state, and to 
branch to address FFFF_FFFOh—the same instruction 
execution starting point used after RESET. Unlike RESET, the 
processor preserves the contents of its caches, the 
floating-point state, the MMX and 3DNow! states, MSRs, and 
the CD and NW bits of the CRO register. 


The edge-sensitive interrupts FLUSH# and SMI# are sampled 
and preserved during the INIT process and are handled 
accordingly after the initialization is complete. However, the 
processor resets any pending NMI interrupt upon sampling 
INIT asserted. 


INIT can be used as an accelerator for 80286 code that requires 
a reset to exit from Protected mode back to Real mode. 
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7 Cache Organization 








System Bus 
Interface Unit 





The following sections describe the basic architecture and 
resources of the AMD-K6-2 processor internal caches. 


The performance of the AMD-K6-2 processor is enhanced by a 
writeback level-one (L1) cache. The cache is organized as a 
separate 32-Kbyte instruction cache and a 32-Kbyte data cache, 
each with two-way set associativity (See Figure 77). The cache 
line size is 32 bytes, and lines are fetched from main memory 
using an efficient, pipelined burst transaction. As the 
instruction cache is filled, each instruction byte is analyzed for 
instruction boundaries using predecode logic. Predecoding 
annotates each instruction byte with information that later 
enables the decoders to efficiently decode multiple instructions 
simultaneously. Translation lookaside buffers (TLB) are also 
used to translate linear addresses to physical addresses. The 
instruction cache is associated with a 64-entry TLB while the 
data cache is associated with a 128-entry TLB. 


32-Kbyte Instruction Cache 





















Tag Way 0 State | Tag Way 1 State 
RAM Bit [RAM Bit 




















64-Entry TLB 


Processor 
Pre-Decode Instruction Cache Core 


A 




























Y 
128-Entry TLB 
T T T T 
Tag | Way 0 MES! | Tag | Way 1 IMes| 
RAM! | Bits |RAM! | Bits 
| 











32-Kbyte Data Cache 


Figure 77. Cache Organization 
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The processor cache design takes advantage of a sectored 
organization (See Figure 78). Each sector consists of 64 bytes 
configured as two 32-byte cache lines. The two cache lines of a 
sector share a common tag but have separate MESI (modified, 
exclusive, shared, invalid) bits that track the state of each cache 
line. 


Instruction Cache Line 















































Tag | Cache Line 0 | Byte 31 | Predecode Bits | Byte 30 | Predecode Bits | ........ |... Byte 0 | Predecode Bits | 1 MESI Bit 
Address | Cache Line 1 Byte 31 | Predecode Bits | Byte 30 | Predecode Bits | ou... | sss Byte 0 | Predecode Bits | 1 MESI Bit 
Data Cache Line 
Tag | CacheLineO | Byte 31 Byte 30 Jee | cseeeee ByteO | 2 MES! Bits 
Address [Cache Line | Byte3? | Byte30 | oow | um Byteo | 2MESIBits 






































Note: Instruction-cache lines have only two coherency states (valid or invalid) rather than 
the four MESI coherency states of data-cache lines. Only two states are needed for the 
instruction cache because these lines are read-only. 


Figure 78. Cache Sector Organization 


7.1 


7.2 


MESI States in the Data Cache 


Predec 


The state of each line in the caches is tracked by the MESI bits. 
The coherency of these states or MESI bits is maintained by 
internal processor snoops and external inquire cycles by the 
system logic. The following four states are defined for the data 
cache: 


Modified—This line has been modified and is different from 
main memory. 


Exclusive—This line is not modified and is the same as main 
memory. If this line is written to, it becomes Modified. 


Shared—If a cache line is in the shared state it means that 
the same line can exist in more than one cache system. 


Invalid—The information in this line is not valid. 


ode Bits 


Decoding x86 instructions is particularly difficult because the 
instructions vary in length, ranging from 1 to 15 bytes long. 
Predecode logic supplies the predecode bits associated with 
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each instruction byte. The predecode bits indicate the number 
of bytes to the start of the next x86 instruction. The predecode 
bits are passed with the instruction bytes to the decoders where 
they assist with parallel x86 instruction decoding. The 
predecode bits use memory separate from the 32-Kbyte 
instruction cache. The predecode bits are stored in an extended 
instruction cache alongside each x86 instruction byte as shown 
in Figure 78. 


7.3 Cache Operation 


The operating modes for the caches are configured by software 
using the not writethrough (NW) and cache disable (CD) bits of 
control register 0 (CRO bits 29 and 30, respectively). These bits 
are used in all operating modes. 


When the CD and NW bits are both set to 0, the cache is fully 
enabled. This is the standard operating mode for the cache. Ifa 
read miss occurs when the processor reads from the cache, a 
line fill (32-byte burst read) on the system bus occurs in order to 
fetch the cache line. Write hits to the cache are updated, while 
write misses and writes to shared lines cause external memory 
updates. Refer to Table 36 on page 193 for a summary of cache 
read and write cycles and the effect of these operations on the 
cache MESI state. 


Note: A write allocate operation can modify the behavior of write 
misses to the cache. See “Write Allocate” on page 186. 


When CD is set to 0 and NW is set to 1, an invalid mode of 
operation exists that causes a general protection fault to occur. 


When CD is set to 1 (disabled) and NW is set to 0, the cache fill 
mechanism is disabled but the contents of the cache are still 
valid. The processor reads from the cache and, if a read miss 
occurs, no line fill takes place. Write hits to the cache are 
updated, while write misses and writes to shared lines cause 
external memory updates. If PWT is driven Low and WB/WT# is 
sampled High, a write hit to a shared line changes the cache- 
line state to exclusive. 


When the CD and NW bits are both set to 1, the cache is fully 
disabled. Even though the cache is disabled, the contents are 
not necessarily invalid. The processor reads from the cache and, 
if a read miss occurs, no line fill takes place. If a write hit 
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occurs, the cache is updated but an external memory update 
does not occur. If a data line is in the exclusive state during a 
write hit, the cache-line state is changed to modified. Cache 
lines in the shared state remain in the shared state after a write 
hit. Write misses access external memory directly. 


The operating system can control the cacheability of a page. 
The paging mechanism is controlled by CR3, the Page Directory 
Entry (PDE), and the Page Table Entry (PTE). Within CR3, 
PDE, and PTE are Page Cache Disable (PCD) and Page 
Writethrough (PWT) bits. The values of the PCD and PWT bits 
used in Table 33 and Table 34 are taken from either the PTE or 
PDE. For more information see the descriptions of PCD and 
PWT on pages 113 and 115, respectively. 


Table 33 describes how the PWT signal is driven based on the 
values of the PWT bits and the PG bit of CRO. 


Table 33. PWT Signal Generation 





PWT Bit* PG Bit of CRO | PWT Signal 





1 High 





1 
0 1 Low 
1 0 Low 
0 0 Low 














Note: 
* PWT is taken from PTE or PDE 











Table 34 describes how the PCD signal is driven based on the 
values of the CD bit of CRO, the PCD bits, and the PG bit of 
CRO. 


Table 34. PCD Signal Generation 





CD Bit of CRO PCD Bit* PG Bit of CRO | PCD Signal 





X High 
High 
Low 











Low 











X 
1 
0 
1 
0 


o;o;o;o;— 
o|lo|—|]— 


Low 





Note: 
* PCD is taken from PTE or PDE 














182 


Cache Organization Chapter 7 


Preliminary Information AMD 





21850)/0—February 2000 AMD-K6®-2 Processor Data Sheet 


Table 35 describes how the CACHE# signal is driven based on 
the cycle type, the CI bit of TR12, the PCD signal, and the 
UWCCR model-specific register. 


Table 35. CACHE# Signal Generation 


























Cycle Type CIBit of TRI2 | PCD Signal weiue ae CACHE# 
Writebacks X X X Low 
Unlocked Reads 0 0 0 Low 
Locked Reads X X X High 
Single Writes X X X High 
Any Cycle Except Writebacks 1 X X High 
Any Cycle Except Writebacks X 1 X High 
Any Cycle Except Writebacks X X 1 High 














Note: 


* WC and UC refer to Write-Combining and Uncacheable Memory Ranges as defined in the UWCCR, and only 
applies to the AMD-K6-2 processor Model 8/[F:8]. 











Cache-Related Signals Complete descriptions of the signals that control cacheability 
and cache coherency are given on the following pages: 


CACHE#— page 96 
EADS#— page 100 
FLUSH#—page 103 
HIT#— page 104 
HITM#—page 104 
INV—page 108 
KEN#—page 109 
PCD—page 113 
PWT—page 115 
WB/W T#—page 123 


7.4 Cache Disabling and Flushing 


To completely disable all cache accesses, the CD bit must be set 
to 1 and the cache must be completely flushed. 


There are three different methods for flushing the cache. The 
first method relies on the system logic and other two methods 
rely on software. 
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For the system logic to flush the cache, the processor must 
sample FLUSH# asserted. In this method, the processor writes 
back any data cache lines that are in the modified state, 
invalidates all lines in the instruction and data caches, and then 
executes a flush acknowledge special cycle (See Table 25 on 
page 126). 


The second method for flushing the caches is for software to 
execute the WBINVD instruction which causes all modified 
lines to first be written back to memory, then marks all cache 
lines as invalid. Alternatively, if writing modified lines back to 
memory is not necessary, the INVD instruction can be used to 
invalidate all cache lines. 


The third and final method for flushing the caches is to make 
use of the Page Flush/Invalidate Register (PFIR), which allows 
cache invalidation and optional flushing of a specific 4-Kbyte 
page from the linear address space. The PFIR is only supported 
on the AMD-K6-2 processor Model 8/[F:8] (see “PFIR” on page 
195). Unlike the previous two methods of flushing the caches, 
this particular method requires the software to be aware of 
which specific pages must be flushed and invalidated. 


7.5 Cache-Line Fills 


The processor performs a cache-line fill for any area of system 
memory defined as cacheable. If an area of system memory is 
not explicitly defined as uncacheable by the software or system 
logic, or implicitly treated as uncacheable by the processor, 
then the memory access is assumed to be cacheable. 


Software can prevent caching of certain pages by setting the 
PCD bit in the PDE or PTE. Additionally, for the AMD-K6-2 
processor Model 8/[F:8], software can define regions of memory 
as uncacheable or write combinable by programming the 
MTRRs in the UWCCR MSR (see “Memory Type Range 
Registers” on page 203). Write-combinable memory is defined 
as uncacheable. 


The system logic also has control of the cacheability of bus 
cycles. If it determines the address is not cacheable, system 
logic negates the KEN# signal when asserting the first BRDY# 
or NA# of a cycle. 
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The processor does not cache certain memory accesses such as 
locked operations. In addition, the processor does not cache 
PDE or PTE memory reads in the L1 cache (referred to as page 
table walks). 


When the processor needs to read memory, the processor drives 
a read cycle onto the bus. If the cycle is cacheable, the 
processor asserts CACHE#. If the cycle is not cacheable, a 
non-burst, single-transfer read takes place. The processor waits 
for the system logic to return the data and assert a single 
BRDY# (See Figure 55 on page 133). If the cycle is cacheable, 
the processor executes a 32-byte burst read cycle. The processor 
expects a total of four BRDY# signals for a burst read cycle to 
take place (See Figure 57 on page 137). 


Cache-line fills initiate 32-byte burst read cycles from memory 
on the system bus for the instruction cache and the data cache. 
If a data-cache line being filled replaces a modified line, the 
modified contents of the line are copied to a 32-byte writeback 
(copyback) buffer in the bus interface unit while the new line is 
being read. 


7.6 Cache-Line Replacements 


As programs execute and task switches occur, some cache lines 
eventually require replacement. 


Instruction cache lines are replaced using a Least Recently 
Used (LRU) algorithm. If line replacement is required, lines are 
replaced when read cache misses occur. 


The data cache uses a slightly different approach to line 
replacement. If a miss occurs, and a replacement is required, 
lines are replaced by using a Least Recently Allocated (LRA) 
algorithm. 


Two forms of cache misses and associated cache fills can take 
place—a tag-miss cache fill and a tag-hit cache fill. In the case 
of a tag-miss cache fill, the miss is due to a tag mismatch, in 
which case the required cache line is filled from external 
memory, and the cache line within the sector that was not 
required is marked as invalid. In the case of a tag-hit cache fill, 
the address matches the tag, but the requested cache line is 
marked as invalid. The required cache line is filled from 
external memory, and the cache line within the sector that is 
not required remains in the same cache state. 
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7.7 Write Allocate 


Write to a Cacheable 
Page 


Write allocate, if enabled, occurs when the processor has a 
pending memory write cycle to a cacheable line and the line 
does not currently reside in the data cache. In this case, the 
processor performs a 32-byte burst read cycle to fetch the 
data-cache line addressed by the pending write cycle. The data 
associated with the pending write cycle is merged with the 
recently-allocated data-cache line and stored in the processor’s 
data cache. The final MESI state of the cache line depends on 
the state of the WB/WT# and PWT signals during the burst read 
cycle and the subsequent L1 data cache write hit (See Table 36 
on page 193 to determine the cache-line states and the access 
types following a cache read miss and cache write hit). 


If a data-cache line fetch from memory is attempted because 
the write allocate misses the data cache, and KEN# is sampled 
negated, the processor does not perform an allocation. In this 
case, the pending write cycle is executed as a single write cycle 
on the system bus. 


During write allocates, a 32-byte burst read cycle is executed in 
place of a non-burst write cycle. While the burst read cycle 
generally takes longer to execute than the non-burst write 
cycle, performance gains are realized on subsequent write cycle 
hits to the write-allocated cache line. Due to the nature of 
software, memory accesses tend to occur in proximity of each 
other (principle of locality). The likelihood of additional write 
hits to the write-allocated cache line is high. 


The following is a description of three mechanisms by which the 
AMD-K6-2 processor performs write allocations. A write 
allocate is performed when any one or more of these 
mechanisms indicates that a pending write is to a cacheable 
area of memory. 


Every time the processor performs a cache line fill, the address 
of the page in which the cache line resides is saved in the 
Cacheability Control Register (CCR). The page address of 
subsequent write cycles is compared with the page address 
stored in the CCR. If the two addresses are equal, then the 
processor performs a write allocate because the page has 
already been determined to be cacheable. 
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When the processor performs a cache line fill from a different 
page than the address saved in the CCR, the CCR is updated 
with the new page address. 


If the address of a pending write cycle matches the tag address 
of a valid cache sector, but the addressed cache line within the 
sector is marked invalid (a sector hit but a cache line miss), 
then the processor performs a write allocate. The pending write 
cycle is determined to be cacheable because the sector hit 
indicates the presence of at least one valid cache line in the 
sector. The two cache lines within a sector are guaranteed by 
design to be within the same page. 


The AMD-K6-2 processor uses two mechanisms that are 
programmable within the Write Handling Control Register 
(WHCR) to enable write allocations for write cycles that 
address a definable area, or a special 1-Mbyte memory area. 
The format of the WHCR differs between the AMD-K6-2 
processor Model 8/[7:0] and the AMD-K6-2 processor Model 
8/[F:8]. 


WHCR - Model 8/[7:0]. This WHCR contains three fields—the 
WCDE bit, the Write Allocate Enable Limit (WAELIM) field, 
and the Write Allocate Enable 15-to-16-Mbyte (WAE15M) bit 
(See Figure 79 on page 187). 


For proper functionality, always program the WCDE bit to 0. 


Selo 


WCDE Always program to 0 8 
WAELIM — Write Allocate Enable Limit 7-1 
WAE15M Write Allocate Enable 15-to-16-Mbyte 0 


Note: Hardware RESET initializes this MSR to all zeros. 


Figure 79. Write Handling Control Register (WHCR) — Model 8/[7:0] 


Write Allocate Enable Limit - Model 8/[7:0]. The WAELIM field is 7 
bits wide. This field, multiplied by 4 Mbytes, defines an upper 
memory limit. Any pending write cycle that addresses memory 
below this limit causes the processor to perform a write allocate 
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(assuming the address is not within a range where write 
allocates are disallowed). Write allocate is disabled for memory 
accesses at and above this limit unless the processor determines 
a pending write cycle is cacheable by means of one of the other 
write allocate mechanisms— “Write to a Cacheable Page” and 
“Write to a Sector.” The maximum value of this memory limit is 
((27-1)-4 Mbytes) = 508 Mbytes. When all the bits in this field 
are set to 0, all memory is above this limit and the write allocate 
mechanism is disabled (even if all bits in the WAELIM field are 
set to 0, write allocates can still occur due to the “Write toa 
Cacheable Page” and “Write to a Sector” mechanisms). 


WHCR - Model 8/[F:8]. This WHCR contains two fields—the Write 
Allocate Enable Limit (WAELIM) field, and the Write Allocate 
Enable 15-to-16-Mbyte (WAE15M) bit (see Figure 80). 




















63 32 31 2221 17 16 15 0 
V 
A 
WAELIM E 
5 
M 
—» Reserved 
Symbol Description Bits 
WAELIM Write Allocate Enable Limit 31-22 


WAE15M — Write Allocate Enable 15-to-16-Mbyte 16 


Note: Hardware RESET initializes this MSR to all zeros. 


Figure 80. Write Handling Control Register (WHCR)— Model 8/[F:8] 


Write Allocate Enable Limit - Model 8/[F:8]. The WAELIM field is 10 
bits wide. This field, multiplied by 4 Mbytes, defines an upper 
memory limit. Any pending write cycle that addresses memory 
below this limit causes the processor to perform a write allocate 
(assuming the address is not within a range where write 
allocates are disallowed). Write allocate is disabled for memory 
accesses at and above this limit unless the processor determines 
a pending write cycle is cacheable by means of one of the other 
write allocate mechanisms— “Write to a Cacheable Page” and 
“Write to a Sector.” The maximum value of this limit is (2-1) 
- 4 Mbytes) = 4092 Mbytes. When all the bits in this field are set 
to 0, all memory is above this limit and the write allocate 
mechanism is disabled (even if all bits in the WAELIM field are 
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set to 0, write allocates can still occur due to the “Write toa 
Cacheable Page” and “Write to a Sector” mechanisms). 


Write Allocate Enable 15-to-16-Mbyte -All Steppings. The Write Allocate 
Enable 15-to-16-Mbyte (WAE15M) bit is used to enable write 
allocations for memory write cycles that address the 1 Mbyte of 
memory between 15 Mbytes and 16 Mbytes. This bit must be set 
to 1 to allow write allocate in this memory area. This bit is 
provided to account for a small number of uncommon 
memory-mapped I/O adapters that use this particular memory 
address space. If the system contains one of these peripherals, 
the bit should be set to 0 (even if the WAE15M bit is set to 0, 
write allocates can still occur between 15 Mbytes and 16 
Mbytes due to the “Write to a Cacheable Page” and “Write toa 
Sector” mechanisms). The WAE15M bit is ignored if the value 
in the WAELIM field is set to less than 16 Mbytes. 


By definition a write allocate is not performed in the memory 
area between 640 Kbytes and 1 Mbyte unless the processor 
determines a pending write cycle is cacheable by means of one 
of the other write allocate mechanisms—“Write to a Cacheable 
Page” and “Write to a Sector.” It is not considered safe to 
perform write allocations between 640 Kbytes and 1 Mbyte 
(O00A_0000h to 000F_FFFFh) because it is considered a 
noncacheable region of memory. 


For AMD-K6-2 processor Model 8/[F:8], if a memory region is 
defined as write-combinable or uncacheable by a MTRR, write 
allocates are not performed in that region. 


Figure 81 shows the logic flow for all the mechanisms involved 
with write allocate for memory bus cycles. The left side of the 
diagram (the text) describes the conditions that need to be true 
in order for the value of that line to be a 1. Items 1 to 4 of the 
diagram are related to general cache operation and items 5 to 
10 are related to the write allocate mechanisms. 


For more information about write allocate, see the 
Implementation of Write Allocate in the K&86™ Processors 
Application Note, order# 21326. 
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1) CD Bit of CRO Perform 

2) PCD Signal Write Allocate 
3) Cl Bit of TRI2 

4) UC or WC 





5) Write to Cacheable Page (CCR) 








6) Write to a Sector 





7) Less Than Limit (WAELIM) 


=a 





8) Between 640 Kbytes and 1 Mbyte 


9) Between 15-16 Mbytes 








— 








10) Write Allocate Enable 15-16 Mbyte (WAE1 aq} 


Figure 81. Write Allocate Logic Mechanisms and Conditions 


The following list describes the corresponding items in Figure 
81: 


1. CD Bit of CRO—When the cache disable (CD) bit within 
control register 0 (CRO) is set to 1, the cache fill mechanism 
for both reads and writes is disabled and write allocate does 
not occur. 


2. PCD Signal—When the PCD (page cache disable) signal is 
driven High, caching for that page is disabled, even if KEN# 
is sampled asserted, and write allocate does not occur. 


3. CI Bit of TR12—When the cache inhibit bit of Test Register 
12 is set to 1, L1 cache fills are disabled and write allocate 
does not occur. 


4. UC or WC—If a pending write cycle addresses a region of 
memory defined as write combinable or uncacheable by an 
MTRR, write allocates are not performed in that region. 
MTRRs are only supported in the AMD-K6-2 processor 
Model 8/[F:8]. For all other steppings, treat this condition as 
equal to 0. 


5. Write to a Cacheable Page (CCR)—A write allocate is 
performed if the processor knows that a page is cacheable. 
The CCR is used to store the page address of the last cache 
fill for a read miss. See “Write to a Cacheable Page” on page 
186 for a detailed description of this condition. 
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6. Write to a Sector—A write allocate is performed if the 
address of a pending write cycle matches the tag address of a 
valid cache sector but the addressed cache line within the 
sector is invalid. See “Write to a Sector” on page 187 fora 
detailed description of this condition. 


7. Less Than Limit (WAELIM)—The write allocate limit 
mechanism determines if the memory area being addressed 
is less than the limit set in the WAELIM field of WHCR. If 
the address is less than the limit, write allocate for that 
memory address is performed as long as conditions 8 
through 10 do not prevent write allocate (even if conditions 
8 and 10 attempt to prevent write allocate, condition 5 or 6 
allows write allocate to occur). 


8. Between 640 Kbytes and 1 Mbyte—Write allocate is not 
performed in the memory area between 640 Kbytes and 1 
Mbyte. It is not considered safe to perform write allocations 
between 640 Kbytes and 1 Mbyte (000A_0000h to 
000F_FFFFh) because this area of memory is considered a 
noncacheable region of memory (even if condition 8 
attempts to prevent write allocate, condition 5 or 6 allows 
write allocate to occur). 


9. Between 15-16 Mbytes—If the address of a pending write 
cycle is in the 1 Mbyte of memory between 15 Mbytes and 16 
Mbytes, and the WAE15M bit is set to 1, write allocate for 
this cycle is enabled. 


10.Write Allocate Enable 15-16 Mbytes (WAE15M)—This 
condition is associated with the Write Allocate Limit 
mechanism and affects write allocate only if the limit 
specified by the WAELIM field is greater than or equal to 
16 Mbytes. If the memory address is between 15 Mbytes and 
16 Mbytes, and the WAE15M bit in the WHCR is set to 0, 
write allocate for this cycle is disabled (even if condition 10 
attempts to prevent write allocate, condition 5 or 6 allows 
write allocate to occur). 
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7.8 Prefetching 

Hardware The AMD-K6-2 processor conditionally performs cache 
Prefetching prefetching which results in the filling of the required cache 


Software Prefetching 


line first, and a prefetch of the second cache line making up the 
other half of the sector. From the perspective of the external 
bus, the two cache-line fills typically appear as two 32-byte 
burst read cycles occurring back-to-back or, if allowed, as 
pipelined cycles. The burst read cycles do not occur 
back-to-back (wait states occur) if the processor is not ready to 
start a new cycle, if higher priority data read or write requests 
exist, or if NA# (next address) was sampled negated. Wait states 
can also exist between burst cycles if the processor samples 
AHOLD or BOFF# asserted. 


The 3DNow! technology includes an instruction called 
PREFETCH that allows a cache line to be prefetched into the 
data cache. Unlike prefetching under hardware control, 
software prefetching only fetches the cache line specified by 
the operand of the PREFETCH instruction, and does not 
attempt to fetch the other cache line in the sector. The 
PREFETCH instruction format is defined in Table 17, 
“3DNow!™ Instructions,” on page 81. For more detailed 
information, see the 3DNow!™ Technology Manual, order# 
21928. 


7.9 Cache States 


Table 36 shows all the possible cache-line states before and 
after program-generated accesses to individual cache lines. The 
table includes the correspondence between MESI states and 
writethrough or writeback states for lines in the data cache. 
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Table 36. Data Cache States for Read and Write Accesses 
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Notes: 


2. If CACHE## Is driven Low and KEN# is sampled asserted. 


WB/WT# is sampled Low, the line is cached in the shared (writethrough) state. 


Assumes PWT is driven Low and WB/WT# is sampled High. 
Assumes PWT is driven High or WB/WT# is sampled Low. 


MND A 





Not applicable or none. 


Cache State After Access 
Type Cache State Before Access ae 
Access Type! 8 Writebac 
. MESI State Writethrough State 
invalid single read from bus invalid - 
Read Miss war burst read from bus, fill shared or writethrough or 
invalid 7 . 3 . 3 
Cache cache exclusive writeback 
Read exclusive - exclusive writeback 
Read Hit modified - modified writeback 
shared - shared writethrough 
invalid single write to bus* invalid 7 
a burst read from bus, fill - 
invalid pea modified® - 
Write Miss cache, write to cache 
burst read from bus, fill 
Cache P's . 
Write invalid cache, write to cache, shared’ = 
single write to bus> 
exclusive or modified write to cache modified writeback 
Write Hit write to cache, single shared or writethrough or 
shared ; ’ noe : 3 
write to bus exclusive writeback 





1. Single read, single write, cache update, and writethrough = 1 to 8 bytes. Line fill =32-byte burst read. 


3. If PWT is driven Low and WB/WT# Is sampled High, the line is cached in the exclusive (writeback) state. If PWT is driven High or 


Assumes the write allocate conditions as specified in “Write Allocate” on page 186 are not met. 
Assumes the write allocate conditions as specified in “Write Allocate” on page 186 are met. 


The final MEST state assumes that the state of the WB/WT# signal remains the same for all accesses to a particular cache line. 
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7.10 Cache Coherency 


Inquire Cycles 


Internal Snooping 


Different ways exist to maintain coherency between the system 
memory and cache memories. Inquire cycles, internal snoops, 
FLUSH#, WBINVD, INVD, and line replacements all prevent 
inconsistencies between memories. 


Inquire cycles are bus cycles initiated by system logic which 
ensure coherency between the caches and main memory. In 
systems with multiple bus masters, system logic maintains 
cache coherency by driving inquire cycles to the processor. 
System logic initiates inquire cycles by asserting AHOLD, 
BOFF#, or HOLD to obtain control of the address bus and then 
driving EADS#, INV (optional), and an inquire address 
(A[31:5]). This type of bus cycle causes the processor to 
compare the tags for both its instruction and data caches with 
the inquire address. If there is a hit to a shared or exclusive line 
in the data cache or a valid line in the instruction cache, the 
processor asserts HIT#. If the compare hits a modified line in 
the data cache, the processor asserts HIT# and HITM#. If 
HITM+# is asserted, the processor writes the modified line back 
to memory. If INV was sampled asserted with EADS#, a hit 
invalidates the line. If INV was sampled negated with EADS#, a 
hit leaves the line in the shared state or transitions it from the 
exclusive or modified state to the shared state. 


Table 37 on page 197 shows the effects of inquire cycles— 
performed with INV equal to 0 (non-invalidating) and INV 
equal to 1 (invalidating)—snoops, and invalidations. 


Internal snooping is initiated by the processor (rather than 
system logic) during certain cache accesses. It is used to 
maintain coherency between the instruction cache and the data 
cache. 


The processor automatically snoops its instruction cache during 
read or write misses to its data cache, and it snoops its data 
cache during read misses to its instruction cache. Table 37 
summarizes the actions taken during this internal snooping. 
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If an internal snoop hits its target, the processor does the 
following: 


m Data cache snoop during an instruction-cache read miss—If 
modified, the line in the data cache is written back on the 
system bus to external memory. Regardless of its state, the 
data-cache line is invalidated and the instruction cache 
performs a burst read cycle from external memory. 


m Instruction cache snoop during a data cache miss—The line in 
the instruction cache is marked invalid, and the data-cache 
read or write is performed as defined in Table 36 on 
page 193. 


In response to sampling FLUSH# asserted, the processor writes 
back any data cache lines that are in the modified state and 
then marks all lines in the instruction and data caches as 
invalid. 


The AMD-K6-2 processor Model 8/[F:8] processor contains the 
Page Flush/Invalidate Register (PFIR) that allows cache 
invalidation and optional flushing of a specific 4-Kbyte page 
from the linear address space (see Figure 82). When the PFIR is 
written to (using the WRMSR instruction), the invalidation 
and, optionally, the flushing begins. The total amount of cache 
in the AMD-K6-2 processor is 64 Kbytes. Using this register can 
result in a much lower cycle count for flushing particular pages 
versus flushing the entire cache. 


32 31 






Figure 82. Page Flush/Invalidate Register (PFIR)—-MSR C000_0088h 


LINPAGE. This 20-bit field must be written with bits 31:12 of the 
linear address of the 4-Kbyte page that is to be invalidated and 
optionally flushed from the L1 cache. 





Chapter 7 


Cache Organization 195 


AMD¢\ 


Preliminary Information 





AMD-K6®-2 Processor Data Sheet 21850J/0—February 2000 


WBINVD and INVD 


Cache-Line 
Replacement 


PF. If an attempt to invalidate or flush a page results in a page 
fault, the processor sets the PF bit to 1, and the invalidate or 
flush operation is not performed (even though invalidate 
operations do not normally generate page faults). In this case, 
an actual page fault exception is not generated. If the PF bit 
equals 0 after an invalidate or flush operation, then the 
operation executed successfully. The PF bit must be read after 
every write to the PFIR register to determine if the invalidate 
or flush operation executed successfully. 


F/I. This bit is used to control the type of action that occurs to 
the specified linear page. If a 0 is written to this bit, the 
operation is a flush, in which case all cache lines in the 
modified state within the specified page are written back to 
memory, after which the entire page is invalidated. If a 1 is 
written to this bit, the operation is an invalidation, in which 
case the entire page is invalidated without the occurrence of 
any writebacks. 


These x86 instructions cause all cache lines to be marked as 
invalid. WBINVD writes back modified lines before marking all 
cache lines invalid. INVD does not write back modified lines. 


Replacing lines in the instruction or data cache, according to 
the line replacement algorithms described in “Cache-Line 
Fills” on page 184, ensures coherency between main memory 
and the caches. 


Table 37 on page 197 shows all possible cache-line states before 
and after various cache-related operations. 





196 


Cache Organization Chapter 7 


Preliminary Information AMD 





21850)/0—February 2000 AMD-K6®-2 Processor Data Sheet 


Table 37. Cache States for Inquire Cycles, Snoops, Flushes, and Invalidation 




































































Cache State After Operation 
Type of Operation cone sede Memory Access : 
Before Operation MESI State Writeback 
Writethrough State 
shared or INV=0 shared writethrough 
Inquire exclusive INV=1 invalid invalid 
Cycle he INV=0 shared writethrough 
modified writeback to bus __ a 
INV=1 invalid invalid 
ae shared or _ 
a exclusive invalid invalid 
modified writeback to bus 
eLSHae shared or 7 
Signal exclusive invalid invalid 
modified writeback to bus 
FIRE shared or _ 
(F/l=0) exclusive invalid invalid 
modified writeback to bus 
PFIR* weet ane 
(Fl=1) - - invalid invalid 
‘GRIND shared or _ 
facreion exclusive invalid invalid 
modified writeback to bus 
INVD = - invalid invalid 
Instruction 
Notes: 
All writebacks are 32-byte burst write cycles. 
- Not applicable or none. 
* The AMD-K6-2 processor Model 8/[F:8] supports the PFIR. 
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Cache Snooping Table 38 shows the conditions under which snooping occurs in 
the AMD-K6-2 processor and the resources that are snooped. 


Table 38. Snoop Action 























Snooping Action 
Type of Event Type of Access Instruction 
Data Cache 
Cache 
Inquire Cycle System Logic yes! yes! 
Read 7 2 
Instruction Miss e 
Cache Read 
Hit : om 
Read 3 _ 
neants Miss he 
nternal Snoo 
j Read i _ 
Data Hit 
Cache Write 3 7 
Miss i 
Write ae _ 
Hit 














Notes: 


1. The processor's response to an inquire cycle depends on the state of the INV input signal 
and the state of the cache line as follows: 
For the instruction cache, if INV is sampled negated, the line remains invalid or valid, but 
if INV is sampled asserted, the line Is invalidated. 
For the data cache, if INV is sampled negated, valid lines remain in or transition to the 
shared state, a modified data cache line is written back before the line 1s marked shared 
(with HITM# asserted), and invalid lines remain invalid. For the data cache, if INV Is 
sampled asserted, the line is marked invalid. Modified lines are written back before 
invalidation. 

2. If an internal snoop hits a modified line in the data cache, the line is written back and 
invalidated. Then the instruction cache performs a burst read from memory. 

3. If an internal snoop hits a line in the instruction cache, the instruction cache line 1s 
invalidated and the data-cache read or write is performed from memory. 


- Not applicable. 
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Writethrough versus Writeback Coherency States 


The terms writethrough and writeback apply to two related 
concepts in a read-write cache like the AMD-K6-2 processor L1 
data cache. The following conditions apply to both the 
writethrough and writeback modes: 


ms Memory Writes—A relationship exists between external 
memory writes and their concurrence with cache updates: 


e An external memory write that occurs concurrently with 
a cache update to the same location is a writethrough. 
Writethroughs are driven as single cycles on the bus. 


e Anexternal memory write that occurs after the processor 
has modified a cache line is a writeback. Writebacks are 
driven as burst cycles on the bus. 


m Coherency State—A relationship exists between MESI 
coherency states and writethrough-writeback coherency 
states of lines in the cache as follows: 


¢ Shared and invalid MESI lines are in the writethrough 
state. 


¢ Modified and exclusive MESI lines are in the writeback 
state. 


A20M# Masking of Cache Accesses 


Although the processor samples A20M# as a level-sensitive 
input on every clock edge, it should only be asserted in Real 
mode. The processor applies the A20M# masking to its tags, 
through which all programs access the caches. Therefore, 
assertion of A20M# affects all addresses (cache and external 
memory), including the following: 

m Cache-line fills (caused by read misses or write allocates) 


m Cache writethroughs (caused by write misses or write hits to 
lines in the shared state) 


However, A2Z0M# does not mask writebacks or invalidations 
caused by the following actions: 


Internal snoops 

Inquire cycles 

The FLUSH# signal 

Writing to the PFIR (AMD-K6-2/[F:8] only) 
The WBINVD instruction 
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Write Merge Buffer 





The AMD-K6-2 processor Model 8/[F:8] contains an 8-byte write 
merge buffer that allows the processor to conditionally combine 
data from multiple noncacheable write cycles into this merge 
buffer. The merge buffer operates in conjunction with the 
Memory Type Range Registers (MTRRs). Refer to “Memory 
Type Range Registers” on page 203 for a description of the 
MTRRs. 


Merging multiple write cycles into a single write cycle reduces 
processor bus utilization and processor stalls, thereby 
increasing the overall system performance. 


EWBE Control 


The presence of the merge buffer creates the potential to 
perform out-of-order write cycles relative to the processor’s L1 
cache. In general, the ordering of write cycles that are driven 
externally on the system bus and those that hit the processor’s 
cache can be controlled by the EWBE# signal. See “EWBE# 
(External Write Buffer Empty)” on page 101 for more 
information. If EWBE# is sampled negated, the processor 
delays the commitment of write cycles to cache lines in the 
modified state or exclusive state in the processor’s cache. 
Therefore, the system logic can enforce strong ordering by 
negating EWBE# until the external write cycle is complete, 
thereby ensuring that a subsequent write cycle that hits the 
cache does not complete ahead of the external write cycle. 


However, the addition of the write merge buffer introduces the 
potential for out-of-order write cycles to occur between writes 
to the merge buffer and writes to the processor’s cache. Because 
these writes occur entirely within the processor and are not 
sent out to the processor bus, the system logic is not able to 
enforce strong ordering with the EWBE# signal. 


The EWBE control (EWBEC) bits in the EFER register provide 
a mechanism for enforcing three different levels of write 
ordering in the presence of the write merge buffer: 


m EFER{[3] is defined as the Global EWBE Disable (GEWBED). 
When GEWBED equals 1, the processor does not attempt to 
enforce any write ordering internally or externally (the 
EWBE# signal is ignored). This is the maximum performance 
setting. 
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m» EFER[2] is defined as the Speculative EWBE Disable 
(SEWBED). SEWBED only affects the processor when 
GEWBED equals 0. If GEWBED equals 0 and SEWBED 
equals 1, the processor enforces strong ordering for all 
internal write cycles with the exception of write cycles 
addressed to a range of memory defined as uncacheable 
(UC) or write-combining (WC) by the MTRRs. In addition, 
the processor samples the EWBE# signal. If EWBE# is 
sampled negated, the processor delays the commitment of 
write cycles to processor cache lines in the modified state or 
exclusive state until EWBE# is sampled asserted. 


This setting provides performance comparable to, but 
slightly less than, the performance obtained when 
GEWBED equals 1 because some degree of write ordering is 
maintained. 


ms If GEWBED equals 0 and SEWBED equals 0, the processor 
enforces strong ordering for all internal and external write 
cycles. In this setting, the processor assumes, or speculates, 
that strong order must be maintained between writes to the 
merge buffer and writes that hit the processor’s cache. Once 
the merge buffer is written out to the processor’s bus, the 
EWBE# signal is sampled. If EWBE# is sampled negated, the 
processor delays the commitment of write cycles to 
processor cache lines in the modified state or exclusive state 
until EWBE# is sampled asserted. 


This setting is the default after RESET and provides the 


lowest performance of the three settings because full write 
ordering is maintained. 


Table 39 summarizes the three settings of the EWBEC field for 
the EFER register, along with the effect of write ordering and 
performance. For more information on the EFER register, see 
“Extended Feature Enable Register (EFER)-—Model 8/[F:8]” on 
page 50. 


Table 39. EWBEC Settings 





EFER[3] | EFER[2] Write 


(GEWBED) | (SEWBED) Ordering Performance 





1 Oorl None Best 





0 1 All except UC/WC | Close-to-Best 





0 0 All Slowest 
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8.2 Memory Type Range Registers 


UC/WC Cacheability 
Control Register 
(UWCCR) 


The AMD-K6-2 processor Model 8/[F:8] provides two variable- 
range Memory Type Range Registers (MTRRs)—MTRRO and 
MTRR1—that each specify a range of memory. Each range can 
be defined as one of the following memory types: 


Uncacheable (UC) memory—Memory read cycles are 
sourced directly from the specified memory address and the 
processor does not allocate a cache line. Memory write 
cycles are targeted at the specified memory address and a 
write allocation does not occur. 


Write-Combining (WC) memory—Memory read cycles are 
sourced directly from the specified memory address and the 
processor does not allocate a cache line. The processor 
conditionally combines data from multiple noncacheable 
write cycles that are addressed within this range into a 
merge buffer. Merging multiple write cycles into a single 
write cycle reduces processor bus utilization and processor 
stalls, thereby increasing the overall system performance. 
This memory type is applicable for linear video frame 
buffers. 


The MTRRs are accessed by addressing the 64-bit MSR known 
as the UC/WC Cacheability Control Register (UWCCR). The 
MSR address of the UWCCR is C000_0085h. Following reset, all 
bits in the UWCCR register are set to 0. MTRRO (lower 32 bits 
of the UWCCR register) defines the size and memory type of 
range 0 and MTRR1 (upper 32 bits) defines the size and 
memory type of range 1 (see Figure 83). 





Chapter 8 


Write Merge Buffer 203 


AMD¢\ 


Preliminary Information 





AMD-K6®-2 Processor Data Sheet 21850J/0—February 2000 


63 


Symbol Description 





Bits Symbol Description Bits 





UCI Uncacheable Memory Type 32 UCO Uncacheable Memory Type Or 
WC1 Write-Combining Memory Type 33 “] | WCO Write-Combining Memory Type 1 "| | 


Physical Base Address 1 


49 


48 34 33 32 31 17 16 2 1 


0 
U 
Physical Address Mask 1 Physical Base Address 0 Physical Address Mask 0} C | C 
0 











ee MTRRI re MTRRO > 


Figure 83. UC/WC Cacheability Control Register (UWCCR)—MSR C000_0085h (Model 8/[F:8]) 


Physical Base Address n (n=0, 1). This address is the 15 most- 
significant bits of the physical base address of the memory 
range. The least-significant 17 bits of the base address are not 
needed because the base address is by definition always aligned 
on a 128-Kbyte boundary. 


Physical Address Mask n (n=0, 1). This value is the 15 most- 
significant bits of a physical address mask that is used to define 
the size of the memory range. This mask is logically ANDed 
with both the physical base address field of the UWCCR 
register and the physical address generated by the processor. If 
the results of the two AND operations are equal, then the 
generated physical address is considered within the range. That 
is, if: 


Mask & Physical Base Address = Mask & Physical Address Generated 


then the physical address generated by the processor is in the 
range. 


WCn (n=0, 1). When set to 1, this memory range is defined as 
write combinable (refer to Table 40). Write-combinable 
memory is uncacheable. 


UCn (n=0, 1). When set to 1, this memory range is defined as 
uncacheable (refer to Table 40). 
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Table 40. WC/UC Memory Type 











WCn UCn Memory Type 

0 0 No effect on cacheability or write combining 

1 0 Write-combining memory range (uncacheable) 
Oorl 1 Uncacheable memory range 

















Memory-Range Restrictions. The following rules regarding the 
address alignment and size of each range must be adhered to 
when programming the physical base address and physical 
address mask fields of the UWCCR register: 


The minimum size of each range is 128 Kbytes. 


m The physical base address must be aligned on a 128-Kbyte 
boundary. 


m The physical base address must be range-size aligned. For 
example, if the size of the range is 1 Mbyte, then the 
physical base address must be aligned on a 1-Mbyte 
boundary. 


m All bits set to 1 in the physical address mask must be 
contiguous. Likewise, all bits set to 0 in the physical address 
mask must be contiguous. For example: 


111_1111_1100_0000b is a valid physical address mask 
111_1111_1101_0000b is invalid 


Table 41 lists the valid physical address masks and the resulting 
range sizes that can be programmed in the UWCCR register. 


Table 41. Valid Masks and Range Sizes 
































Masks Size 
111_1111_1111_1111b 128 Kbytes 
111_1111_1111_1110b 256 Kbytes 
111_1111_1111_1100b 512 Kbytes 
111_1111_1111_1000b 1 Mbyte 
111_1111_1111_0000b 2 Mbytes 
111_1111_1110_0000b 4 Mbytes 
111_1111_1100_0000b 8 Mbytes 
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Table 41. Valid Masks and Range Sizes (continued) 



































Masks Size 
111_1111_1000_0000b 16 Mbytes 
111_1111_0000_0000b 32 Mbytes 
111_1110_0000_0000b 64 Mbytes 
111_1100_0000_0000b 128 Mbytes 
111_1000_0000_0000b 256 Mbytes 
111_0000_0000_0000b 512 Mbytes 
110_0000_0000_0000b 1 Gbyte 
100_0000_0000_0000b 2 Gbytes 
000_0000_0000_0000b 4 Gbytes 








Example. Suppose that the range of memory from 16 Mbytes to 
32 Mbytes is uncacheable, and the 8-Mbyte range of memory on 
top of 1 Gbyte is write-combinable. Range 0 is defined as the 
uncacheable range, and range 1 is defined as the write- 
combining range. 


Extracting the 15 most-significant bits of the 32-bit physical 
base address that corresponds to 16 Mbytes (0100_0000h) yields 
a physical base address 0 field of 000_0000_1000_0000b. 
Because the uncacheable range size is 16 Mbytes, the physical 
mask value 0 field is 111_1111_1000_0000b, according to Table 
41. Bit 1 of the UWCCR register (WCO) is set to 0 and bit 0 of 
the UWCCR register is set to 1 (UCO). 





Extracting the 15 most-significant bits of the 32-bit physical 
base address that corresponds to 1 Gbyte (4000_0000h) yields a 
physical base address 1 field of 010_0000_0000_0000b. Because 
the write-combining range size is 8 Mbytes, the physical mask 
value 1 field is 111_1111_1100_0000b, according to Table 41. Bit 
33 of the UWCCR register (WC1) is set to 1 and bit 32 of the 
UWCCR register is set to 0 (UC1). 
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9 Floating-Point and Multimedia Execution Units 





Handling 
Floating-Point 
Exceptions 


External Logic 
Support of 
Floating-Point 
Exceptions 


Floating-Point Execution Unit 


The AMD-K6-2 processor contains an IEEE 754-compatible and 
854-compatible floating-point execution unit designed to 
accelerate the performance of software that utilizes the x86 
floating-point instruction set. Floating-point software is 
typically written to manipulate numbers that are very large or 
very small, that require a high degree of precision, or that result 
from complex mathematical operations such as 
transcendentals. Applications that take advantage of 
floating-point operations include geometric calculations for 
graphics acceleration, scientific, statistical, and engineering 
applications, and business applications that use large amounts 
of high-precision data. 


The high-performance floating-point execution unit contains an 
adder unit, a multiplier unit, and a divide/square root unit. 
These low-latency units can execute floating-point instructions 
in as few as two processor clocks. To increase performance, the 
processor is designed to simultaneously decode most 
floating-point instructions with most short-decodeable 
instructions. 


See “Software Environment” on page 21 for a description of the 
floating-point data types, registers, and instructions. 


The AMD-K6-2 processor provides the following two types of 
exception handling for floating-point exceptions: 


ms Ifthe numeric error (NE) bit in CRO is set to 1, the processor 
invokes the interrupt 10h handler. In this manner, the 
floating-point exception is completely handled by software. 


= If the NE bit in CRO is set to 0, the processor requires 
external logic to generate an interrupt on the INTR signal in 
order to handle the exception. 


The processor provides the FERR# (Floating-Point Error) and 
IGNNE# (Ignore Numeric Error) signals to allow the external 
logic to generate the interrupt in a manner consistent with 
IBM-compatible PC/AT systems. The assertion of FERR# 
indicates the occurrence of an unmasked floating-point 
exception resulting from the execution of a floating-point 
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AMD-K6®-2 


Processor 


FERR# 


INTR 
IGNNE# 





+> Port Foh 


> CLEAR 


instruction. IGNNE# is used by the external hardware to control 
the effect of an unmasked floating-point exception. Under 
certain circumstances, if IGNNE# is sampled asserted, the 
processor ignores the floating-point exception. 


Figure 84 illustrates an implementation of external logic for 
supporting floating-point exceptions. The following example 
explains the operation of the external logic in Figure 84: 


As the result of a floating-point exception, the processor 
asserts FERR#. The assertion of FERR# and the 
sampling of IGNNE# negated indicates the processor has 
stopped instruction execution and is waiting for an 
interrupt. The assertion of FERR# leads to the assertion 
of INTR by the interrupt controller. The processor 
acknowledges the interrupt and jumps to the 
corresponding interrupt service routine in which an I/O 
write cycle to address port FOh leads to the assertion of 
IGNNE#. When IGNNE# is sampled asserted, the 
processor ignores the floating-point exception and 
continues instruction execution. When the processor 
negates FERR#, the external logic negates IGNNE#. 


See “FERR# (Floating-Point Error)” on page 102 and “IGNNE# 
(Ignore Numeric Exception)” on page 106 for more details. 





I/O Address 








IGNNE# 
Flip-Flop 


RESET ___)_ > —+— > CLOCK Q 


be ° DATA =6Q 



































FERR# Interrupt 
Flip-Flop Controller 


a> CLOCK Q } 4 1RQ13 
DATA =6Q 
CLEAR 
























































Figure 84. External Logic for Supporting Floating-Point Exceptions 
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9.2 Multimedia and 3DNow!™ Execution Units 


The multimedia and 3DNow! execution units of the AMD-K6-2 
processor are designed to accelerate the performance of 
software written using the industry-standard MMX instructions 
and the new 3DNow! instructions. Applications that can take 
advantage of the MMX and 3DNow! instructions include 
graphics, video and audio compression and decompression, 
speech recognition, and telephony applications. 


The multimedia execution unit can execute MMX instructions 
in a single processor clock. All MMX and 3DNow! arithmetic 
instructions are pipelined for higher performance. To increase 
performance, the processor is designed to simultaneously 
decode all MMX and 3DNow! instructions with most other 
instructions. 


For more information on MMX instructions, see the AMD-K6® 
Processor Multimedia Technology Manual, order# 20726. For 
more information on 3DNow! instructions, see the 3DNow!™ 
Technology Manual, order# 21928. 


9.3 Floating-Point and MMX™/3DNow!™ Instruction Compatibility 


Registers 


Exceptions 


FERR# and IGNNE# 


The eight 64-bit MMX registers (which are also utilized by 
3DNow! instructions) are mapped on the floating-point stack. 
This enables backward compatibility with all existing software. 
For example, the register saving event that is performed by 
operating systems during task switching requires no changes to 
the operating system. The same support provided in an 
operating system’s interrupt 7 handler (Device Not Available) 
for saving and restoring the floating-point registers also 
supports saving and restoring the MMX registers. 


There are no new exceptions defined for supporting the MMX 
and 3DNow! instructions. All exceptions that occur while 
decoding or executing an MMX or 3DNow! instruction are 
handled in existing exception handlers without modification. 


MMxX instructions and 3DNow! instructions do not generate 
floating-point exceptions. However, if an unmasked 
floating-point exception is pending, the processor asserts 
FERR# at the instruction boundary of the next floating-point 
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instruction, MMX instruction, 3DNow! instruction or WAIT 
instruction. 


The sampling of IGNNE# asserted only affects processor 
operation during the execution of an error-sensitive 
floating-point instruction, MMX instruction, 3DNow! 
instruction or WAIT instruction when the NE bit in CRO is set to 
0. 
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System Management Mode (SMM) 





10.2 


Overview 


SMM is an alternate operating mode entered by way of a system 
Management interrupt (SMI#) and handled by an interrupt 
service routine. SMM is designed for system control activities 
such as power management. These activities appear 
transparent to conventional operating systems like DOS and 
Windows. SMM is primarily targeted for use by the Basic Input 
Output System (BIOS) and specialized low-level device drivers. 
The code and data for SMM are stored in the SMM memory 
area, which is isolated from main memory. 


The processor enters SMM by the assertion of the SMI# 
interrupt and the processor’s acknowledgment by the assertion 
of SMIACT#. At this point the processor saves its state into the 
SMM memoty state-save area and jumps to the SMM service 
routine. The processor returns from SMM when it executes the 
RSM (resume) instruction from within the SMM service 
routine. Subsequently, the processor restores its state from the 
SMM save area, negates SMIACT#, and resumes execution with 
the instruction following the point where it entered SMM. 


The following sections summarize the SMM state-save area, 
entry into and exit from SMM, exceptions and interrupts in 
SMM, memory allocation and addressing in SMM, and the SMI# 
and SMIACT# signals. 


SMM Operating Mode and Default Register Values 


The software environment within SMM has the following 
characteristics: 


Addressing and operation in Real mode 
m 4-Gbyte segment limits 


Default 16-bit operand, address, and stack sizes, although 
instruction prefixes can override these defaults 


= Control transfers that do not override the default operand 
size truncate the EJP to 16 bits 
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ms Far jumps or calls cannot transfer control to a segment with 
a base address requiring more than 20 bits, as in Real mode 
segment-base addressing 


A20M# is masked 

Interrupt vectors use the Real-mode interrupt vector table 
The IF flag in EFLAGS is cleared (INTR not recognized) 
The TF flag in EFLAGS is cleared 

The NMI and INIT interrupts are disabled 

Debug register DR7 is cleared (debug traps disabled) 


Figure 85 shows the default map of the SMM memory area. It 
consists of a 64-Kbyte area, between 0003_0000h and 
0003_FFFFh, of which the top 32 Kbytes (0003_8000h to 
0003_FFFFh) must be populated with RAM. The default 
code-segment (CS) base address for the area—called the SMM 
base address—is at 0003_0000h. The top 512 bytes 
(0003_FE00h to 0003_FFFFh) contain a fill-down SMM 
state-save area. The default entry point for the SMM service 
routine is 0003_8000h. 
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Fill Down SMM 0003_FFFFh 


| State-Save 


A 
ci 0003_FEO0h 


32-Kbyte 
Minimum RAM 


SMM 
Service Routine 


Service Routine Entry Point 0003. 8000h 





SMM Base Address (CS) 0003_0000h 


Figure 85. SMM Memory 


Table 42 shows the initial state of registers when entering SMM. 


Table 42. Initial State of Registers in SMM 





























Registers SMM Initial State 

General Purpose Registers unmodified 
EFLAGS 0000_0002h 

CRO PE, EM, TS, and PG are cleared (bits 0, 2, 3, 

and 31). The other bits are unmodified. 

DR7 0000_0400h 
GDTR, LDTR, IDTR, TSSR, DR6 unmodified 
EIP 0000_8000h 
cs 0003_0000h 
DS, ES, FS, GS, SS 0000_0000h 
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10.3 SMM State-Save Area 


When the processor acknowledges an SMI# interrupt by 
asserting SMIACT#, it saves its state in a 512-byte SMM 
state-save area shown in Table 43. The save begins at the top of 
the SMM memory area (SMM base address + FFFFh) and fills 
down to SMM base address + FEOOh. 


Table 43 shows the offsets in the SMM state-save area relative 
to the SMM base address. The SMM service routine can alter 
any of the read/write values in the state-save area. 


Table 43. SMM State-Save Area Map 













































































Address Offset Contents Saved 
FFFCh CRO 
FFF8h CR3 
FFF4h EFLAGS 
FFFOh EIP 
FFECh EDI 
FFE8h ESI 
FFE4h EBP 
FFEOh ESP 
FFDCh EBX 
FFD8h EDX 
FFD4h ECX 
FFDOh EAX 
FFCCh DR6 
FFC8h DR7 
FFC4h TR 
FFCOh LDTR Base 
FFBCh GS 
FFB8h FS 
FFB4h DS 
FFBOh SS 
FFACh CS 
FFA8h ES 
Notes: 
— No data dump at that address 
* Only contains information if SMI# is asserted during a valid /O bus cycle. 
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Table 43. SMM State-Save Area Map (continued) 


































































































Address Offset Contents Saved 
FFA4h 1/O Trap Dword 
FFAOh - 
FF9Ch 1/0 Trap EIP* 
FF98h - 
FF94h - 
FF90h IDT Base 
FF8Ch IDT Limit 
FF88h GDT Base 
FF84h GDT Limit 
FF80h TSS Attr 
FF7Ch TSS Base 
FF78h TSS Limit 
FF74h - 
FF70h LDT High 
FF6Ch LDT Low 
FF68h GS Attr 
FF64h GS Base 
FF60h GS Limit 
FF5Ch FS Attr 
FF58h FS Base 
FF54h FS Limit 
FF50h DS Attr 
FF4Ch DS Base 
FF48h DS Limit 
FF44h SS Attr 
FF40h SS Base 
FF3Ch SS Limit 
FF38h CS Attr 
FF34h CS Base 
FF30h CS Limit 
FF2Ch ES Attr 

Notes: 
— No data dump at that address 
* Only contains information if SMI# is asserted during a valid |/O bus cycle. 
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Table 43. SMM State-Save Area Map (continued) 
Address Offset Contents Saved 
FF28h ES Base 
FF24h ES Limit 
FF20h = 
FFICh = 
FF18h = 
FF14h CR2 
FF10h CR4 
FFOCh I/O Restart ESI* 
FFO8h I/O Restart ECX* 
FFO4h I/O Restart EDI* 
FFO2h HALT Restart Slot 
FFOOh 1/0 Trap Restart Slot 
FEFCh SMM RevID 
FEF8h SMM Base 
FEF7h-FEOOh - 
Notes: 
— No data dump at that address 
* Only contains information if SMI# is asserted during a valid |/O bus cycle. 
10.4 SMM Revision Identifier 
The SMM revision identifier at offset FEFCh in the SMM 
state-save area specifies the version of SMM and the extensions 
that are available on the processor. The SMM revision identifier 
fields are as follows: 
m Bits 31-18—Reserved 
ms Bit 17—SMM base address relocation (1 = enabled) 
= Bit 16—IJ/O trap restart (1 = enabled) 
m Bits 15-O—SMM revision level for the AMD-K6-2 processor 
= 0002h 
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Table 44 shows the format of the SMM Revision Identifier. 


Table 44. SMM Revision Identifier 


























31-18 17 16 15-0 
Reserved | SMM Base Relocation | I/O Trap Extension SMM Revision Level 
0 ] 1 0002h 
10.5 SMM Base Address 


During RESET, the processor sets the base address of the 
code-segment (CS) for the SMM memory area—the SMM base 
address—to its default, 0003_0000h. The SMM base address at 
offset FEF8h in the SMM state-save area can be changed by the 
SMM service routine to any address that is aligned toa 
32-Kbyte boundary. (Locations not aligned to a 32-Kbyte 
boundary cause the processor to enter the Shutdown state when 
executing the RSM instruction.) 


In some operating environments it may be desirable to relocate 
the 64-Kbyte SMM memory area to a high memory area in order 
to provide more low memory for legacy software. During system 
initialization, the base of the 64-Kbyte SMM memory area is 
relocated by the BIOS. To relocate the SMM base address, the 
system enters the SMM handler at the default address. This 
handler changes the SMM base address location in the SMM 
state-save area, copies the SMM handler to the new location, 
and exits SMM. 


The next time SMM is entered, the processor saves its state at 
the new base address. This new address is used for every SMM 
entry until the SMM base address in the SMM state-save area is 
changed or a hardware reset occurs. 


10.6 Halt Restart Slot 


During entry into SMM, the halt restart slot at offset FFO2h in 
the SMM state-save area indicates if SMM was entered from the 
Halt state. Before returning from SMM, the halt restart slot 
(offset FFO2h) can be written to by the SMM service routine to 
specify whether the return from SMM takes the processor back 
to the Halt state or to the next instruction after the HLT 
instruction. 
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Upon entry into SMM, the halt restart slot is defined as follows: 
Bits 15-1—Reserved 

= Bit 0O—Point of entry to SMM: 
1 = entered from Halt state 
0 = not entered from Halt state 


After entry into the SMI handler and before returning from 
SMM, the halt restart slot can be written using the following 
definition: 
m Bits 15—1—Reserved 

Bit 0O—Point of return when exiting from SMM: 

1 = return to Halt state 

0 = return to next instruction after the HLT instruction 


If the return from SMM takes the processor back to the Halt 
state, the HLT instruction is not re-executed, but the Halt 
special bus cycle is driven on the bus after the return. 


10.7 1/0 Trap Dword 


If the assertion of SMI# is recognized during the execution of an 
I/O instruction, the I/O trap dword at offset FFA4h in the SMM 
state-save area contains information about the instruction. The 
fields of the I/O trap dword are configured as follows: 


Bits 31-16—J/O port address 
m Bits 15-4—Reserved 


Bit 3—REP (repeat) string operation 
(1 = REP string, 0 = not a REP string) 


a Bit 2—I/O string operation 
(1 =J/O string, 0 = not an J/O string) 


Bit 1— Valid I/O instruction (1 = valid, 0 = invalid) 
s Bit 0O—Input or output instruction (1 = INx, 0 = OUTx) 
Table 45 shows the format of the I/O trap dword. 


Table 45. 1/0 Trap Dword Configuration 





31-16 15-4 3 2 1 0 
1/0 Port Rasened REP String 1/0 String Valid 1/0 Input or 
Address Operation Operation Instruction Output 
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The I/O trap dword is related to the I/O trap restart slot (see “I/O 
Trap Restart Slot”). If bit 1 of the I/O trap dword is set by the 
processor, it means that SMI# was asserted during the 
execution of an I/O instruction. The SMI handler tests bit 1 to 
see if there is a valid I/O instruction trapped. If the I/O 
instruction is valid, the SMI handler is required to ensure the 
I/O trap restart slot is set properly. The I/O trap restart slot 
informs the processor whether it should re-execute the I/O 
instruction after the RSM or execute the instruction following 
the trapped I/O instruction. 


Note: If SMI# is sampled asserted during an I/O bus cycle a 
minimum of three clock edges before BRDY# is sampled 
asserted, the associated I/O instruction is guaranteed to be 
trapped by the SMI handler. 


10.8 I/O Trap Restart Slot 


The I/O trap restart slot at offset FFOOh in the SMM state-save 
area specifies whether the trapped I/O instruction should be 
re-executed on return from SMM. This slot in the state-save area 
is called the I/O instruction restart function. Re-executing a 
trapped I/O instruction is useful, for example, if an I/O write 
occurs to a disk that is powered down. The system logic 
monitoring such an access can assert SMI#. Then the SMM 
service routine would query the system logic, detect a failed I/O 
write, take action to power-up the I/O device, enable the I/O 
trap restart slot feature, and return from SMM. 


The fields of the I/O trap restart slot are defined as follows: 
m Bits 31-16—Reserved 


a Bits 15-O—I/O instruction restart on return from SMM: 


0000h = execute the next instruction after the trapped 
I/O instruction 


OOFFh = re-execute the trapped I/O instruction 


Table 46 shows the format of the I/O trap restart slot. 


Table 46. 1/0 Trap Restart Slot 





31-16 15-0 

1/0 Instruction restart on return from SMM: 

Reserved m 0000h = execute the next instruction after the trapped I/O 
m OOFFh=re-execute the trapped I/O instruction 
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The processor initializes the I/O trap restart slot to 0000h upon 
entry into SMM. If SMM was entered due to a trapped I/O 
instruction, the processor indicates the validity of the I/O 
instruction by setting or clearing bit 1 of the I/O trap dword at 
offset FFA4h in the SMM state-save area. The SMM service 
routine should test bit 1 of the I/O trap dword to determine if a 
valid I/O instruction was being executed when entering SMM 
and before writing the I/O trap restart slot. If the I/O instruction 
is valid, the SMM service routine can safely rewrite the I/O trap 
restart slot with the value OOFFh, which causes the processor to 
re-execute the trapped I/O instruction when the RSM 
instruction is executed. If the I/O instruction is invalid, writing 
the I/O trap restart slot has undefined results. 


If a second SMI# is asserted and a valid I/O instruction was 
trapped by the first SMM handler, the processor services the 
second SMI# prior to re-executing the trapped I/O instruction. 
The second entry into SMM never has bit 1 of the I/O trap dword 
set, and the second SMM service routine must not rewrite the 
I/O trap restart slot. 


During a simultaneous SMI# I/O instruction trap and debug 
breakpoint trap, the AMD-K6-2 processor first responds to the 
SMI# and postpones recognizing the debug exception until 
after returning from SMM via the RSM instruction. If the debug 
registers DR3-DRO are used while in SMM, they must be saved 
and restored by the SMM handler. The processor automatically 
saves and restores DR7-DR6. If the I/O trap restart slot in the 
SMM state-save area contains the value 0O0FFh when the RSM 
instruction is executed, the debug trap does not occur until 
after the I/O instruction is re-executed. 


10.9 Exceptions, Interrupts, and Debug in SMM 


During an SMI# I/O trap, the exception/interrupt priority of the 
AMD-K6-2 processor changes from its normal priority. The 
normal priority places the debug traps at a priority higher than 
the sampling of the FLUSH# or SMI# signals. However, during 
an SMI# I/O trap, the sampling of the FLUSH# or SMI# signals 
takes precedence over debug traps. 


The processor recognizes the assertion of NMI within SMM 
immediately after the completion of an IRET instruction. Once 
NMI is recognized within SMM, NMI recognition remains 
enabled until SMM is exited, at which point NMI masking is 
restored to the state it was in before entering SMM. 
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Test and Debug 





The AMD-K6-2 processor implements various test and debug 
modes to enable the functional and manufacturing testing of 
systems and boards that use the processor. In addition, the 
debug features of the processor allow designers to debug the 
instruction execution of software components. This chapter 
describes the following test and debug features: 


m Built-In Self-Test (BIST)—The BIST, which is invoked after 
the falling transition of RESET, runs internal tests that 
exercise most on-chip RAM structures. 

m Tri-State Test Mode—A test mode that causes the processor 
to float its output and bidirectional pins. 

ms Boundary-Scan Test Access Port (TAP)—The Joint Test Action 
Group (JTAG) test access function defined by the IEEE 
Standard Test Access Port and Boundary-Scan Architecture 
(IEEE 1149.1-1990) specification. 

m Level-One (L1) Cache Inhibit—A feature that disables the 
processor’s internal L1 instruction and data caches. 

ms Debug Support—Consists of all x86-compatible software 
debug features, including the debug extensions. 


Built-In Self-Test (BIST) 


Following the falling transition of RESET, the processor 
unconditionally runs its BIST. The internal resources tested 
during BIST include the following: 


ms L1 instruction and data caches 
m Instruction and Data Translation Lookaside Buffers (TLBs) 


The contents of the EAX general-purpose register after the 
completion of reset indicate if the BIST was successful. If EAX 
contains 0000_0000h, then BIST was successful. If EAX is 
non-zero, the BIST failed. Following the completion of the BIST, 
the processor jumps to address FFFF_FFF0Oh to start 
instruction execution, regardless of the outcome of the BIST. 


The BIST takes approximately 295,000 processor clocks to 
complete. 
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11.2 Tri-State Test Mode 


The Tri-State Test mode causes the processor to float its output 
and bidirectional pins, which is useful for board-level 
manufacturing testing. In this mode, the processor is 
electrically isolated from other components on a system board, 
allowing automated test equipment (ATE) to test components 
that drive the same signals as those the processor floats. 


If the FLUSH# signal is sampled Low during the falling 
transition of RESET, the processor enters the Tri-State Test 
mode. (See “FLUSH# (Cache Flush)” on page 103 for the 
specific sampling requirements.) The signals floated in the 
Tri-State Test mode are as follows: 


m A[31:3] a D/C# a M/IO# 

a ADS# m D([63:0] = PCD 

a ADSC# a DP[7:0] ms PCHK# 

a AP mw FERR# = PWT 

ms APCHK# a HIT# mw SCYC 

ms BE[7:0]# a HITM# a SMIACT#H 
=» BREQ =» HLDA m W/R# 

a CACHE# ms LOCK# 


The VCC2DET, VCC2H/L#, and TDO signals are the only 
outputs not floated in the Tri-State Test mode. VCC2DET and 
VCC2H/L# must remain Low to ensure the system continues to 
supply the specified processor core voltage to the Vcc? pins. 
TDO is never floated because the Boundary-Scan Test Access 
Port must remain enabled at all times, including during the 
Tri-State Test mode. 


The Tri-State Test mode is exited when the processor samples 
RESET asserted. 
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11.3 Boundary-Scan Test Access Port (TAP) 


Test Access Port 


TAP Signals 


The boundary-scan Test Access Port (TAP) is an IEEE standard 
that defines synchronous scanning test methods for complex 
logic circuits, such as boards containing a processor. The 
AMD-K6-2 processor supports the TAP standard defined in the 
IEEE Standard Test Access Port and Boundary-Scan Architecture 
(IEEE 1149.1-1990) specification. 


Boundary scan testing uses a shift register consisting of the 
serial interconnection of boundary-scan cells that correspond to 
each I/O buffer of the processor. This non-inverting register 
chain, called a Boundary Scan Register (BSR), can be used to 
capture the state of every processor pin and to drive every 
processor output and bidirectional pin to a known state. 


Each BSR of every component on a board that implements the 
boundary-scan architecture can be serially interconnected to 
enable component interconnect testing. 


The TAP consists of the following: 


m Test Access Port (TAP) Controller—The TAP controller is a 
synchronous, finite state machine that uses the TMS and 
TDI input signals to control a sequence of test operations. 
See “TAP Controller State Machine” on page 232 for a list 
of TAP states and their definition. 


m Instruction Register (IR)—The IR contains the instructions 
that select the test operation to be performed and the Test 
Data Register (TDR) to be selected. See “TAP Registers” on 
page 224 for more details on the IR. 


m Test Data Registers (TDR)—The three TDRs are used to 
process the test data. Each TDR is selected by an 
instruction in the Instruction Register (IR). See “TAP 
Registers” on page 224 for a list of these registers and their 
functions. 


The test signals associated with the TAP controller are as 
follows: 


ms TCK—The Test Clock for all TAP operations. The rising edge 
of TCK is used for sampling TAP signals, and the falling 
edge of TCK is used for asserting TAP signals. The state of 
the TMS signal sampled on the rising edge of TCK causes 
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TAP Registers 


the state transitions of the TAP controller to occur. TCK can 
be stopped in the logic 0 or 1 state. 


us TDI—The Test Data Input represents the input to the most 
significant bit of all TAP registers, including the IR and all 
test data registers. Test data and instructions are serially 
shifted by one bit into their respective registers on the rising 
edge of TCK. 


us TDO—The Test Data Output represents the output of the 
least significant bit of all TAP registers, including the IR and 
all test data registers. Test data and instructions are serially 
shifted by one bit out of their respective registers on the 
falling edge of TCK. 


ms TMS—The Test Mode Select input specifies the test 
function and sequence of state changes for boundary-scan 
testing. If TMS is sampled High for five or more consecutive 
clocks, the TAP controller enters its reset state. 


us TRST#—The Test Reset signal is an asynchronous reset that 
unconditionally causes the TAP controller to enter its reset 
state. 


Refer to “Electrical Data” on page 253 and “Signal Switching 
Characteristics” on page 267 to obtain the electrical 
specifications of the test signals. 


The AMD-K6-2 processor provides an Instruction Register (IR) 
and three Test Data Registers (TDR) to support the 
boundary-scan architecture. The IR and one of the TDRs—the 
Boundary-Scan Register (BSR)—consist of a shift register and 
an output register. The shift register is loaded in parallel in the 
Capture states. (See “TAP Controller State Machine” on page 
232 for a description of the TAP controller states.) In addition, 
the shift register is loaded and shifted serially in the Shift 
states. The output register is loaded in parallel from its 
corresponding shift register in the Update states. 


Instruction Register (IR). The IR is a 5-bit register, without parity, 
that determines which instruction to run and which test data 
register to select. When the TAP controller enters the 
Capture-IR state, the processor loads the following bits into the 
IR shift register: 


ms 01b—Loaded into the two least significant bits, as specified 
by the IEEE 1149.1 standard 


ms 000b—Loaded into the three most significant bits 
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Loading 00001b into the IR shift register during the Capture-IR 
state results in loading the SAMPLE/PRELOAD instruction. 


For each entry into the Shift-IR state, the IR shift register is 
serially shifted by one bit toward the TDO pin. During the shift, 
the most significant bit of the IR shift register is loaded from 
the TDI pin. 


The IR output register is loaded from the IR shift registerin the 
Update-IR state, andthe current instructionis defined by the IR 
output register. See “TAP Instructions” on page 231 foralist and 
definition of the instructions supported by the AMD-K6-2 
processor. 


Boundary Scan Register (BSR). The BSR is a Test Data Register 
consisting of the interconnection of 152 boundary-scan cells. 
Each output and bidirectional pin of the processor requires a 
two-bit cell, where one bit corresponds to the pin and the other 
bit is the output enable for the pin. When a 0 is shifted into the 
enable bit of a cell, the corresponding pin is floated, and when a 
1 is shifted into the enable bit, the pin is driven valid. Each 
input pin requires a one-bit cell that corresponds to the pin. The 
last cell of the BSR is reserved and does not correspond to any 
processor pin. 


The total number of bits that comprise the BSR is 281. The 
order of the bits in the BSR differs between the Model 8/[7:0] 
and the Model 8/[F:8] processors. Table 47 on page 227 and 
Table 48 on page 229 list the order of these bits, respectively, 
where TDI is the input to bit 280, and TDO is driven from the 
output of bit 0. The entries listed as pin_E (where pin is an 
output or bidirectional signal) are the enable bits. 


If the BSR is the register selected by the current instruction 
and the TAP controller is in the Capture-DR state, the processor 
loads the BSR shift register as follows: 


m If the current instruction is SAMPLE/PRELOAD, then the 
current state of each input, output, and bidirectional pin is 
loaded. A bidirectional pin is treated as an output if its 
enable bit equals 1, and it is treated as an input if its enable 
bit equals 0. 

m If the current instruction is EXTEST, then the current state 
of each input pin is loaded. A bidirectional pin is treated as 
an input, regardless of the state of its enable. 
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While in the Shift-DR state, the BSR shift register is serially 
shifted toward the TDO pin. During the shift, bit 280 of the BSR 
is loaded from the TDI pin. 


The BSR output register is loaded with the contents of the BSR 
shift register in the Update-DR state. If the current instruction 
is EXTEST, the processor’s output pins, as well as those 
bidirectional pins that are enabled as outputs, are driven with 
their corresponding values from the BSR output register. 
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Bit | Pin/Enable | Bit | Pin/Enable | Bit | Pin/Enable | Bit | Pin/Enable| Bit | Pin/Enable | Bit | Pin/Enable 

280 | D35_E 247 | D2 214 | D4_E 181 | A3 148 | A20 115 | Al6 

279 | D35 246 | D18_E 213 | D4 180 | A31_E 147 | Al3_E 114 | FERR_E 

278 | D29_E 245 | D18 212 | DPO_E 179 | A3l 146 | Al3 113 | FERR# 

277 | D29 244 | D19_E 211 | DPO 178 | A21_E 145 | DP7_E 112 | HIT_E 

276 | D33_E 243 | D19 210 | HOLD 177 | A2l 144 | DP7 111 | HIT# 

275 | D33 242 | D16_E 209 | BOFF# 176 | A30_E 143 | BE6_E 110 | BE7_E 

274 | D27_E 241 | D16 208 | AHOLD 175 | A30 142 | BE6# 109 | BE7# 

273 | D27 240 | D17_E 207 | STPCLK# 174 | A7_E 141 | Al2_E 108 | NA# 

272 | DP3_E 239 | D17 206 | INIT 173 | A7 140 | Al2 107 | ADSC_E 

271 | DP3 238 |D15_E 205 | IGNNE# 172 | A24_E 139 | CLK 106 | ADSC# 

270 | D25_E 237 | D15 204 | BF 171 | A24 138 | BE4_E 105 | BE5_E 

269 | D25 236 | DP1_E 203 | BF2 170 | Al8_E 137 | BE4# 104 | BE5# 

268 | DO_E 235 | DPI 202 | RESET 169 | Al8 136 | Al0_E 103 | WB/WT# 

267 | DO 234 | D13_E 201 | BFO 168 | A5_E 135 | Al0 102 | PWT_E 

266 | D30_E 233 | D13 200 | FLUSH# 167 | A5 134 | D63_E 101 | PWT 

265 | D30 232 | D6_E 199 | INTR 166 | A22_E 133 | D63 100 | BE3_E 

264 | DP2_E 231 | D6 198 | NMI 165 | A22 132 | BE2_E 99 | BE3# 

263 | DP2 230 |D14_E 197 | SMI# 164 | EADS# 131 | BE2# 98 | BREQ_E 

262 | D2_E 229 | D14 196 | A25_E 163 | A4_E 130 | Al5_E 97 | BREQ 

261 | D2 228 | DI1_E 195 | A25 162 | A4 129 | Al5 96 | PCD_E 

260 | D28_E 227 | D1 194 | A23_E 161 | HITM_E 128 | BRDY# 95 | PCD 

259 | D28 226 | DI_E 193 | A23 160 | HITM# 127 | BE1_E 94 | WR_E 

258 |D24_E 225 | D1 192 | A26_E 159 | A9_E 126 | BE1# 93 | W/R# 

257 | D24 224 | D1I2_E 191 | A26 158 | A9 125 | Al4_E 92 | SMIACT_E 

256 | D26_E 223 | D12 190 | A29_E 157 | SCYC_E 124 | Al4 91 | SMIACT# 

255 | D26 222 | D1I0_E 189 | A29 156 | SCYC 123 | BRDYC# 90 | EWBE# 

254 | D22_E 221 | D10 188 | A28_E 155 | A8_E 122 | BEO_E 89 |DC_E 

253 | D22 220 | D7_E 187 | A28 154 | A8 121 | BEO# 88 | D/C# 

252 | D23_E 219 | D7 186 | A27_E 153 | Al9_E 120 | Al7_E 87 | APCHK_E 

251 | D23 218 | D8_E 185 | A27 152 | Alg9 119 | Al7 86 | APCHK# 

250 | D20_E 217 | D8 184 | Al1_E 151 | A6_E 118 | KEN# 85 | CACHE_E 

249 | D20 216 |D9_E 183 | All 150 | A6 117 | AZOM# 84 | CACHE# 

248 | D21_E 215 | D9 182 | A3_E 149 | A20_E 116 | Al6é_E 83 | ADS_E 
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Table 47. Boundary Scan Bit Definitions for Model 8/[7:0] (continued) 















































Bit | Pin/Enable | Bit | Pin/Enable | Bit | Pin/Enable | Bit | Pin/Enable| Bit | Pin/Enable | Bit | Pin/Enable 
82 | ADS# 68 | DP6_E 54 | D53_E 40 | D43_E 26 | D38_E 12 | D3_E 

81 | AP_E 67 | DP6 53 | D53 39 | D43 25 | D38 11 | D3 

80 | AP 66 |D54_E 52 | D47_E 38 | D62_E 24 | D58_E 10 | D39_E 
79 | INV 65 | D54 51 | D47 37 | D62 23 | D58 9 |D39 

78 | HLDA_E 64 | D50_E 50 | D59_E 36 | D49_E 22 | D42_E 8 |D32_E 
77 | HLDA 63 | D50 49 | D59 35 | D49 21 | D42 7 |D32 

76 | PCHK_E 62 | D56_E 48 | D51_E 34 | DP4_E 20 | D36_E 6 |D5_E 

75 | PCHK# 61 | D56 47 | D51 33 | DP4 19 | D36 5 |D5 

74 | LOCK_E 60 |D55_E 46 |D45_E 32 |D46_E 18 | D60_E 4 |D37_E 
73 | LOCK# 59 | D55 45 | D45 31 | D46 17 | D60 3 | D37 

72 | MIO_E 58 | D48_E 44 | D61_E 30 | D41_E 16 | D40_E 2 |D31_E 

71 | M/lO# 57 | D48 43 | D6l 29 | D4l 15 | D40 1 | D31 

70 |D52_E 56 | D57_E 42 | DP5_E 28 | D44_E 14 | D34_E 0 | Reserved 
69 | D52 55 | D57 41 | DP5 27 | D44 13 | D34 
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Table 48. Boundary Scan Bit Definitions for Model 8/[F:8] 








































































































Bit | Pin/Enable | Bit | Pin/Enable | Bit | Pin/Enable | Bit | Pin/Enable| Bit | Pin/Enable | Bit | Pin/Enable 
280 | D35_E 247 | D19 214 | BF1 181 | A24 148 | Al4 115 | BE7# 

279 | D35 246 | D16_E 213 | BF2 180 | A18_E 147 | Al7_E 114 | PCD_E 
278 | D29_E 245 | D16 212 | RESET 179 | Alg 146 | Al7 113 | PCD 

277 | D29 244 | DI7_E 211 | BFO 178 | A5_E 145 | Al6_E 112 | DC_E 

276 | D33_E 243 | DI7 210 | FLUSH# 177 | A5 144 | Al6 111 | D/C# 

275 | D33 242 | D1I5_E 209 | INTR 176 | EADS# 143 | HIT_E 110 | WR_E 
274 | D27_E 241 | D15 208 | NMI 175 | A22_E 142 | HIT# 109 | W/R# 
273 | D27 240 | DP1_E 207 | SMH 174 | A22 141 | ADS_E 108 | NA# 

272 | DPO_E 239 | DPI 206 | A25_E 173 | AHOLD 140 | ADS# 107 | PWT_E 
271 | DPO 238 | D1I3_E 205 | A25 172 | HITM_E 139 | CLK 106 | PWT 

270 | DP3_E 237 | D13 204 | A26_E 171 | HITM# 138 | ADSC_E 105 | CACHE_E 
269 | DP3 236 | D6_E 203 | A26 170 | A4_E 137 | ADSC# 104 | CACHE# 
268 | D25_E 235 | D6 202 | A29_E 169 | A4 136 | BEO_E 103 | WB/WT# 
267 | D25 234 | D14_E 201 | A29 168 | A9_E 135 | BEO# 102 | MIO_E 
266 | DO_E 233 | D14 200 | A28_E 167 | A9 134 | AP_E 101 | M/lO# 
265 | DO 232 | DII_E 199 | A28 166 | A8_E 133 | AP 100 | BREQ_E 
264 | D30_E 231 | D1 198 | A23_E 165 | A8 132 | BEI_E 99 | BREQ 
263 | D30 230 | DI_E 197 | A23 164 | A19_E 131 | BEI# 98 | SCYCE 
262 | DP2_E 229 | D1 196 | A27_E 163 | Al9 130 | BE2_E 97 | SCYC 

261 | DP2 228 | D12_E 195 | A27 162 | BOFF# 129 | BE2# 96 | LOCK_E 
260 | D2_E 227 | D12 194 | Al1_E 161 | A6_E 128 | BRDY# 95 | LOCK# 
259 | D2 226 | DI0_E 193 | All 160 | A6 127 | BE3_E 94 | APCHK_E 
258 | D28_E 225 | D10 192 | A3_E 159 | A20_E 126 | BE3# 93 | APCHK# 
257 | D28 224 | D7_E 191 | A3 158 | A20 125 | BE4_E 92 | PCHK_E 
256 | D24_ E 223 | D7 190 | A31_E 157 | A13_E 124 | BE4# 91 | PCHK# 
255 | D24 222 | D8_E 189 | A3l 156 | Al3 123 | BRDYC# 90 | EWBE# 
254 | D26_E 221 | D8 188 | A21_E 155 | Al2_E 122 | BE5_E 89 | SMIACT_E 
253 | D26 220 | D9_E 187 | A21 154 | Al2 121 | BES# 88 | SMIACT# 
252 | D21_E 219 | D9 186 | A30_E 153 | Al0_E 120 | BE6_E 87 | FERR_E 
251 | D21 218 | HOLD 185 | A30 152 | Al0 119 | BE6# 86 | FERR# 
250 | D18_E 217 | STPCLK# 184 | A7_E 151 | Al5_E 118 | KEN# 85 |D20_E 
249 | D18 216 | INIT 183 | A7 150 | Al5 117 | INV 84 | D20 

248 | D19_E 215 | IGNNE# 182 | A24_E 149 | Al4_E 116 | BE7_E 83 | D22_E 
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Table 48. Boundary Scan Bit Definitions for Model 8/[F:8] (continued) 
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Bit | Pin/Enable | Bit | Pin/Enable | Bit | Pin/Enable | Bit | Pin/Enable| Bit | Pin/Enable | Bit | Pin/Enable 
82 | D22 68 | D54_E 54 | D47_E 40 | D62_E 26 | D38_E 12 |D3_E 
81 /D23_E 67 | D54 53 | D47 39 | D62 25 | D38 11 | D3 
80 | D23 66 | D50_E 52 | D59_E 38 | D49_E 24 |D58_E 10 | D39_E 
79 | AZOM# 65 | D50 51 | D59 37 | D49 23 | D58 9 | D39 
78 | HLDA_E 64 | D56_E 50 | D5I_E 36 | DP4_E 22 | D42_E 8 | D32_E 
77 | HLDA 63 | D56 49 | DSI 35 | DP4 21 | D42 7 | D32 
76 | DP7_E 62 | D55_E 48 |D45_E 34 |D4E 20 | D36_E 6 |D5_E 
75 | DP7 61 | D55 47 | D45 33 | D4 19 | D36 5 |D5 
74 | D63_E 60 | D48_E 46 | D6él_E 32 | D46_E 18 | D60_E 4 | D37_E 
73 | D63 59 | D48 45 | Dél 31 | D46 17 | D6o 3 | D37 
72 |D52_E 58 |D57_E 44 |DP5_E 30 | D41_E 16 | D40_E 2 |D31_E 
71 | D52 57 |D57 43 | DP5 29 | D4 15 | D40 1 | D31 
70 | DP6_E 56 | D53_E 42 |D43_E 28 |D44_E 14 | D34_E 0 | Reserved 
69 | DP6 55 | D53 Al | D43 27 | D44 13, | D34 
Device Identification Register (DIR). The DIR is a 32-bit Test Data 
Register selected during the execution of the IDCODE 
instruction. The fields of the DIR and their values are shown in 
Table 49 and are defined as follows: 
m Version Code—This 4-bit field is incremented by AMD 
manufacturing for each major revision of silicon. 
ms Part Number—This 16-bit field identifies the specific 
processor model. 
ms Manufacturer—This 11-bit field identifies the manufacturer 
of the component (AMD). 
as LSB—The least significant bit (LSB) of the DIR is always set 
to 1, as specified by the IEEE 1149.1 standard. 
Table 49. Device Identification Register 
Version Code Part Number Manufacturer LSB 
(Bits 31-28) (Bits 27-12) (Bits 11-1) (Bit 0) 
Xh 0580h 00000000001b 1b 
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Bypass Register (BR). The BR is a Test Data Register consisting of 
a 1-bit shift register that provides the shortest path between 
TDI and TDO. When the processor is not involved in a test 
operation, the BR can be selected by an instruction to allow the 
transfer of test data through the processor without having to 
serially scan the test data through the BSR. This functionality 
preserves the state of the BSR and significantly reduces test 
time. 


The BR register is selected by the BYPASS and HIGHZ 
instructions as well as by any instructions not supported by the 
AMD-K6-2 processor. 


The processor supports the three instructions required by the 
IEEE 1149.1 standard—EXTEST, SAMPLE/PRELOAD, and 
BYPASS ~— as well as two additional optional instructions— 
IDCODE and HIGHZ. 


Table 50 shows the complete set of TAP instructions supported 
by the processor along with the 5-bit Instruction Register 
encoding and the register selected by each instruction. 


Table 50. Supported Tap Instructions 


























Instruction Encoding Register Description 
EXTEST! 00000b BSR Sample inputs and drive outputs 
SAMPLE / PRELOAD 00001b BSR Sample inputs and outputs, then load the BSR 
IDCODE 00010b DIR Read DIR 
HIGHZ 00011b BR Float outputs and bidirectional pins 
BYPASS2 00100b-11110b BR Undefined instruction, execute the BYPASS instruction 
BYPASS? 11111b BR Connect TDI to TDO to bypass the BSR 
Notes: 
1. Following the execution of the EXTEST instruction, the processor must be reset in order to return to normal, non-test operation. 
2. These instruction encodings are undefined on the AMD-K6-2 processor and default to the BYPASS instruction. 
3. Because the TDI input contains an internal pullup, the BYPASS instruction is executed if the TDI input is not connected or open 
during an instruction scan operation. The BYPASS instruction does not affect the normal operational state of the processor. 














EXTEST. When the EXTEST instruction is executed, the 
processor loads the BSR shift register with the current state of 
the input and bidirectional pins in the Capture-DR state and 
drives the output and bidirectional pins with the corresponding 
values from the BSR output register in the Update-DR state. 
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TAP Controller State 
Machine 


SAMPLE/PRELOAD. The SAMPLE/PRELOAD instruction performs 
two functions. These functions are as follows: 


= During the Capture-DR state, the processor loads the BSR 
shift register with the current state of every input, output, 
and bidirectional pin. 


= During the Update-DR state, the BSR output register is 
loaded from the BSR shift register in preparation for the 
next EXTEST instruction. 


The SAMPLE/PRELOAD instruction does not affect the normal 
operational state of the processor. 


BYPASS. The BYPASS instruction selects the BR register, which 
reduces the boundary-scan length through the processor from 
281 to one (TDI to BR to TDO). The BYPASS instruction does 
not affect the normal operational state of the processor. 


IDCODE. The IDCODE instruction selects the DIR register, 
allowing the device identification code to be shifted out of the 
processor. This instruction is loaded into the IR when the TAP 
controller is reset. The IDCODE instruction does not affect the 
normal operational state of the processor. 


HIGHZ. The HIGHZ instruction forces all output and 
bidirectional pins to be floated. During this instruction, the BR 
is selected and the normal operational state of the processor is 
not affected. 


The TAP controller state diagram is shown in Figure 86 on page 
233. State transitions occur on the rising edge of TCK. The logic 
0 or 1 next to the states represents the value of the TMS signal 
sampled by the processor on the rising edge of TCK. 
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Figure 86. TAP State Diagram 
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The states of the TAP controller are described as follows: 


Test-Logic-Reset. This state represents the initial reset state of the 
TAP controller and is entered when the processor samples 
RESET asserted, when TRST# is asynchronously asserted, and 
when TMS is sampled High for five or more consecutive clocks. 
In addition, this state can be entered from the Select-IR-Scan 
state. The IR is initialized with the IDCODE instruction, and 
the processor’s normal operation is not affected in this state. 


Capture-DR. During the SAMPLE/PRELOAD instruction, the 
processor loads the BSR shift register with the current state of 
every input, output, and bidirectional pin. During the EXTEST 
instruction, the processor loads the BSR shift register with the 
current state of every input and bidirectional pin. 


Capture-IR. When the TAP controller enters the Capture-IR state, 
the processor loads 01b into the two least significant bits of the 
IR shift register and loads 000b into the three most significant 
bits of the IR shift register. 


Shift-DR. While in the Shift-DR state, the selected TDR shift 
register is serially shifted toward the TDO pin. During the shift, 
the most significant bit of the TDR is loaded from the TDI pin. 


Shift-IR. While in the Shift-IR state, the IR shift register is 
serially shifted toward the TDO pin. During the shift, the most 
significant bit of the IR is loaded from the TDI pin. 


Update-DR. During the SAMPLE/PRELOAD instruction, the BSR 
output register is loaded with the contents of the BSR shift 
register. During the EXTEST instruction, the output pins, as 
well as those bidirectional pins defined as outputs, are driven 
with their corresponding values from the BSR output register. 


Update-IR. In this state, the IR output register is loaded from the 
IR shift register, and the current instruction is defined by the 
IR output register. 
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The following states have no effect on the normal or test 
operation of the processor other than as shown in Figure 86 on 
page 233: 


= Run-Test/Idle—This state is an idle state between scan 
operations. 


m Select-DR-Scan—This is the initial state of the test data 
register state transitions. 


m Select-IR-Scan—This is the initial state of the Instruction 
Register state transitions. 


ms Exitl-DR—This state is entered to terminate the shifting 
process and enter the Update-DR state. 


ms Exitl-[R—This state is entered to terminate the shifting 
process and enter the Update-IR state. 


ms Pause-DR—This state is entered to temporarily stop the 
shifting process of a Test Data Register. 


ms Pause-[R—This state is entered to temporarily stop the 
shifting process of the Instruction Register. 


m Exit2-DR—This state is entered in order to either terminate 
the shifting process and enter the Update-DR state or to 
resume shifting following the exit from the Pause-DR state. 


m Exit2-IR—This state is entered in order to either terminate 
the shifting process and enter the Update-IR state or to 
resume shifting following the exit from the Pause-IR state. 


11.4 L1 Cache Inhibit 


Purpose 


The AMD-K6-2 processor provides a means for inhibiting the 
normal operation of its L1 instruction and data caches while 
still supporting an external cache. This capability allows system 
designers to disable the L1 cache during the testing and debug 
of an external cache. 


If the Cache Inhibit bit (bit 3) of Test Register 12 (TR12) is set 
to 0, the processor’s L1 cache is enabled and operates as 
described in “Cache Organization” on page 179. If the Cache 
Inhibit bit is set to 1, the L1 cache is disabled and no new cache 
lines are allocated. Even though new allocations do not occur, 
valid L1 cache lines remain valid and are read by the processor 
when a requested address hits a cache line. In addition, the 
processor continues to support inquire cycles initiated by the 
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Debug Registers 


system logic, including the execution of writeback cycles when 
a modified cache line is hit. 


While the L1 is inhibited, the processor continues to drive the 
PCD output signal appropriately, which system logic can use to 
control external caching. 


In order to completely disable the L1 cache so no valid lines 
exist in the cache, the Cache Inhibit bit must be set to 1 and the 
cache must be flushed in one of the following ways: 


Asserting the FLUSH# input signal 
Executing the WBINVD instruction 


m Executing the INVD instruction (modified cache lines are 
not written back to memory) 


m Make use of the Page Flush/Invalidate Register (PFIR) 
(AMD-K6-2/[F:8] only)(see “PFIR” on page 195) 


The AMD-K6-2 processor implements the standard x86 debug 
functions, registers, and exceptions. In addition, the processor 
supports the I/O breakpoint debug extension. The debug 
feature assists programmers and system designers during 
software execution tracing by generating exceptions when one 
or more events occur during processor execution. The exception 
handler, or debugger, can be written to perform various tasks, 
such as displaying the conditions that caused the breakpoint to 
occur, displaying and modifying register or memory contents, or 
single-stepping through program execution. 


The following sections describe the debug registers and the 
various types of breakpoints and exceptions that the processor 
supports. 


Figures 87 through 90 show the 32-bit debug registers 
supported by the processor. 
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Symbol 
LEN 3 


RW3 
LEN 2 
RW2 
LEN 1 
RW 1 
LENO 
RW 


31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 1110 9 8 7 6 5 4 3 2 1 «0 


LEN | RAW | LEN | RYW | LEN | R/WY LEN | RW G G{L]G}JLILILIGPLIGY]L 
3 3 2 | 2 1 1 0 0 D EJEJ3)3]2]2]1]1]0]0 























Reserved 

Symbol Description Bit 
GD General Detect Enabled 13 
GE Global Exact Breakpoint Enabled 9 
LE Local Exact Breakpoint Enabled 8 
G3 Global Exact Breakpoint #3 Enabled 7 
L3 Local Exact Breakpoint # 3 Enabled 6 
Q2 Global Exact Breakpoint #2 Enabled 5 
L2 Local Exact Breakpoint # 2 Enabled 4 
Gl Global Exact Breakpoint #1 Enabled 3 
L Local Exact Breakpoint # 1 Enabled 2 
GO Global Exact Breakpoint #0 Enabled 1 
LO Local Exact Breakpoint # 0 Enabled 0 














Figure 87. Debug Register DR7 
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Description 


Length 
Type 0 
Length 
Type 0 
Length 
Type 0 
Length 
Type 0 








Transactio 
of Breakpo 
Transactio 
of Breakpo 
Transactio 
of Breakpo 
Transactio 





n(s) 
int # 
n(s) 
int # 
n(s) 
int # 
n(s) 





of Breakpoint #3 


0 Tra 
2 
0 Tra 
] 
0 Tra 
0 
0 Tra 





Bits 
31-30 
p 29-28 
27-26 
p 25-24 
23-22 
p 21-20 
19-18 





p 17-16 
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31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 1110 9 8 7 6 5 43 2 1 «0 


By B B 
T]S 3 


Reserved 
Symbol Description Bit 
BT Breakpoint Task Switch 15 
BS Breakpoint Single Step 14 
BD Breakpoint Debug Access Detected 13 
B3 Breakpoint #3 Condition Detected 3 
B2 Breakpoint #2 Condition Detected 2 
Bl Breakpoint #1 Condition Detected 1 
point #0 Condition Detected 0 


BO Brea 


Nw 
—-w 























Figure 88. Debug Register DR6 





DR5 
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 1110 9 8 7 6 5 43 2 1 «0 


OO 
DR4 


31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 1110 9 8 7 6 5 4 3 2 1 ~«0 


Reserved 


Figure 89. Debug Registers DR5 and DR4 
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31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 1110 9 8 7 6 5 43 2 1 «0 


Breakpoint 3 32-bit Linear Address 





DR2 


31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 1110 9 8 7 6 5 43 2 1 «0 


Breakpoint 2 32-bit Linear Address 





31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 43 2 1 «0 


Breakpoint 1 32-bit Linear Address 





DRO 


31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 1110 9 8 7 6 5 43 2 1 «0 


Breakpoint 0 32-bit Linear Address 





Figure 90. Debug Registers DR3, DR2, DR1, and DRO 


DR3-DRO. The processor allows the setting of up to four 
breakpoints. DR3-DRO contain the linear addresses for 
breakpoint 3 through breakpoint 0, respectively, and are 
compared to the linear addresses of processor cycles to 
determine if a breakpoint occurs. Debug register DR7 defines 
the specific type of cycle that must occur in order for the 
breakpoint to occur. 


DR5-DR4. When debugging extensions are disabled (bit 3 of CR4 
is set to 0), the DR5 and DR4 registers are mapped to DR7 and 
DR6, respectively, in order to be software compatible with 
previous generations of x86 processors. When debugging 
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extensions are enabled (bit 3 of CR4 is set to 1), any attempt to 
load DR5 or DR4 results in an undefined opcode exception. 
Likewise, any attempt to store DR5 or DR4 also results in an 
undefined opcode exception. 


DR6. If a breakpoint is enabled in DR7, and the breakpoint 
conditions as defined in DR7 occur, then the corresponding 
B-bit (B3-B0O) in DR6 is set to 1. In addition, any other 
breakpoints defined using these particular breakpoint 
conditions are reported by the processor by setting the 
appropriate B-bits in DR6, regardless of whether these 
breakpoints are enabled or disabled. However, if a breakpoint is 
not enabled, a debug exception does not occur for that 
breakpoint. 


If the processor decodes an instruction that writes or reads DR7 
through DRO, the BD bit (bit 13) in DR6 is set to 1 (if enabled in 
DR7) and the processor generates a debug exception. This 
operation allows control to pass to the debugger prior to debug 
register access by software. 


If the Trap Flag (bit 8) of the EFLAGS register is set to 1, the 
processor generates a debug exception after the successful 
execution of every instruction (single-step operation) and sets 
the BS bit (bit 14) in DR6 to indicate the source of the 
exception. 


When the processor switches to a new task and the debug trap 
bit (T-bit) in the corresponding Task State Segment (TSS) is set 
to 1, the processor sets the BT bit (bit 15) in DR6 and generates 
a debug exception. 


DR7. When set to 1, L3—L0 locally enable breakpoints 3 through 
0, respectively. L3—LO are set to 0 whenever the processor 
executes a task switch. Setting L3-L0 to 0 disables the 
breakpoints and ensures that these particular debug exceptions 
are only generated for a specific task. 


When set to 1, G3-GO globally enable breakpoints 3 through 0, 
respectively. Unlike L3-—L0, G3—G0 are not set to 0 whenever the 
processor executes a task switch. Not setting G3-—G0 to 0 allows 
breakpoints to remain enabled across all tasks. If a breakpoint 
is enabled globally but disabled locally, the global enable 
overrides the local enable. 
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The LE (bit 8) and GE (bit 9) bits in DR7 have no effect on the 
operation of the processor and are provided in order to be 
software compatible with previous generations of x86 
processors. 


When set to 1, the GD bit in DR7 (bit 13) enables the debug 
exception associated with the BD bit (bit 13) in DR6. This bit is 
set to 0 when a debug exception is generated. 


LEN3-LENO and RW3-RW0O are two-bit fields in DR7 that 
specify the length and type of each breakpoint as defined in 
Table 51. 


Table 51. DR7 LEN and RW Definitions 






























































LEN Bits! RW Bits Breakpoint 
00b 00b2 Instruction Execution 
00b One-byte Data Write 
O1b O1b Two-byte Data Write 
11b Four-byte Data Write 
00b One-byte I/O Read or Write 
Olb 10b° Two-byte I/O Read or Write 
11b Four-byte I/O Read or Write 
00b One-byte Data Read or Write 
O1b 11b Two-byte Data Read or Write 
11b Four-byte Data Read or Write 
Notes: 
1. LEN bits equal to 10b is undefined. 
2. When RW equals 00b, LEN must be equal to 00b. 
3. When RW equals 10b, debugging extensions (DE) must be enabled (bit 3 of CR4 must be set 
to 1). If DE is set to 0, then RW equal to 10b is undefined. 





A debug exception is categorized as either a debug trap ora 
debug fault. A debug trap calls the debugger following the 
execution of the instruction that caused the trap. A debug fault 
calls the debugger prior to the execution of the instruction that 
caused the fault. All debug traps and faults generate either an 
Interrupt 01h or an Interrupt 03h exception. 
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Interrupt Olh. The following events are considered debug traps 
that cause the processor to generate an Interrupt O1h 
exception: 


ms Enabled breakpoints for data and I/O cycles 
m Single Step Trap 
m Task Switch Trap 


The following events are considered debug faults that cause the 
processor to generate an Interrupt 01h exception: 


m Enabled breakpoints for instruction execution 
m BD bitin DR6 set to 1 


Interrupt 03h. The INT 3 instruction is defined in the x86 
architecture as a breakpoint instruction. This instruction 
causes the processor to generate an Interrupt 03h exception. 
This exception is a debug trap because the debugger is called 
following the execution of the INT 3 instruction. 


The INT 3 instruction is a one-byte instruction (opcode CCh) 
typically used to insert a breakpoint in software by writing CCh 
to the address of the first byte of the instruction to be trapped 
(the target instruction). Following the trap, if the target 
instruction is to be executed, the debugger must replace the 
INT 3 instruction with the first byte of the target instruction. 
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The AMD-K6-2 processor supports five modes of clock control. 
The processor can transition between these modes to maximize 
performance, to minimize power dissipation, or to provide a 
balance between performance and power. (See “Power 
Dissipation” on page 257 for the maximum power dissipation of 
the AMD-K6-2 processor within the normal and reduced-power 
states.) 


The five clock-control states supported are as follows: 


Normal State: The processor is running in Real Mode, 
Virtual-8086 Mode, Protected Mode, or System Management 
Mode (SMM). In this state, all clocks are running—including 
the external bus clock CLK and the internal processor 
clock—and the full features and functions of the processor 
are available. 


Halt State: This low-power state is entered following the 
successful execution of the HLT instruction. During this 
state, the internal processor clock is stopped. 


Stop Grant State: This low-power state is entered following 
the recognition of the assertion of the STPCLK# signal. 
During this state, the internal processor clock is stopped. 


Stop Grant Inquire State: This state is entered from the Halt 
state and the Stop Grant state as the result of a 
system-initiated inquire cycle. 

Stop Clock State: This low-power state is entered from the 
Stop Grant state when the CLK signal is stopped. 


The following sections describe each of the four low-power 
states. Figure 91 on page 248 illustrates the clock control state 
transitions. 
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12.1 Halt State 


Enter Halt State 


Exit Halt State 


During the execution of the HLT instruction, the AMD-K6-2 
processor executes a Halt special cycle. After BRDY# is 
sampled asserted during this cycle, and then EWBE# is also 
sampled asserted (if not masked off), the processor enters the 
Halt state in which the processor disables most of its internal 
clock distribution. In order to support the following operations, 
the internal phase-lock loop (PLL) still runs, and some internal 
resources are still clocked in the Halt state: 


= Inquire Cycles: The processor continues to sample AHOLD, 
BOFF#, and HOLD in order to support inquire cycles that 
are initiated by the system logic. The processor transitions to 
the Stop Grant Inquire state during the inquire cycle. After 
returning to the Halt state following the inquire cycle, the 
processor does not execute another Halt special cycle. 


m Flush Cycles: The processor continues to sample FLUSH#. If 
FLUSH# is sampled asserted, the processor performs the 
flush operation in the same manner as it is performed in the 
Normal state. Upon completing the flush operation, the 
processor executes the Halt special cycle which indicates 
the processor is in the Halt state. 


= Time Stamp Counter (TSC): The TSC continues to count in 
the Halt state. 


m Signal Sampling: The processor continues to sample INIT, 
INTR, NMI, RESET, and SMI#. 


After entering the Halt state, all signals driven by the processor 
retain their state as they existed following the completion of 
the Halt special cycle. 


The AMD-K6-2 processor remains in the Halt state until it 
samples INIT, INTR (if interrupts are enabled), NMI, RESET, or 
SMI# asserted. If any of these signals is sampled asserted, the 
processor returns to the Normal state and performs the 
corresponding operation. All of the normal requirements for 
recognition of these input signals apply within the Halt state. 
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12.2 Stop Grant State 


Enter Stop Grant 
State 


Exit Stop Grant State 


After recognizing the assertion of STPCLK#, the AMD-K6-2 
processor flushes its instruction pipelines, completes all 
pending and in-progress bus cycles, and acknowledges the 
STPCLK# assertion by executing a Stop Grant special bus cycle. 
After BRDY# is sampled asserted during this cycle, and then 
EWBE# is also sampled asserted (if not masked off), the 
processor enters the Stop Grant state. The Stop Grant state is 
like the Halt state in that the processor disables most of its 
internal clock distribution in the Stop Grant state. In order to 
support the following operations, the internal PLL still runs, 
and some internal resources are still clocked in the Stop Grant 
state: 


m Inquire cycles: The processor transitions to the Stop Grant 
Inquire state during an inquire cycle. After returning to the 
Stop Grant state following the inquire cycle, the processor 
does not execute another Stop Grant special cycle. 


= Time Stamp Counter (TSC): The TSC continues to count in 
the Stop Grant state. 


m Signal Sampling: The processor continues to sample INIT, 
INTR, NMI, RESET, and SMI#. 


FLUSH# is not recognized in the Stop Grant state (unlike while 
in the Halt state). 


Upon entering the Stop Grant state, all signals driven by the 
processor retain their state as they existed following the 
completion of the Stop Grant special cycle. 


The AMD-K6-2 processor remains in the Stop Grant state until 
it samples STPCLK# negated or RESET asserted. If STPCLK# 
is sampled negated, the processor returns to the Normal state in 
less than 10 bus clock (CLK) periods. After the transition to the 
Normal state, the processor resumes execution at the 
instruction boundary on which STPCLK# was initially 
recognized. 


If STPCLK# is recognized as negated in the Stop Grant state 
and subsequently sampled asserted prior to returning to the 
Normal state, the AMD-K6-2 processor guarantees that a 
minimum of one instruction is executed prior to re-entering the 
Stop Grant state. 
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If INIT, INTR (if interrupts are enabled), FLUSH#, NMI, or 
SMI# are sampled asserted in the Stop Grant state, the 
processor latches the edge-sensitive signals (INIT, FLUSH#, 
NMI, and SMI#), but otherwise does not exit the Stop Grant 
state to service the interrupt. When the processor returns to the 
Normal state due to sampling STPCLK# negated, any pending 
interrupts are recognized after returning to the Normal state. 
To ensure their recognition, all of the normal requirements for 
these input signals apply within the Stop Grant state. 


If RESET is sampled asserted in the Stop Grant state, the 
processor immediately returns to the Normal state and the 
reset process begins. 


12.3 Stop Grant Inquire State 


Enter Stop Grant 
Inquire State 


Exit Stop Grant 
Inquire State 


The Stop Grant Inquire state is entered from the Stop Grant 
state or the Halt state when EADS# is sampled asserted during 
an inquire cycle initiated by the system logic. The AMD-K6-2 
processor responds to an inquire cycle in the same manner as in 
the Normal state by driving HIT# and HITM#. If the inquire 
cycle hits a modified data cache line, the processor performs a 
writeback cycle. 


Following the completion of any writeback, the processor 
returns to the state from which it entered the Stop Grant 
Inquire state. 


12.4 Stop Clock State 


Enter Stop Clock 
State 


If the CLK signal is stopped while the AMD-K6-2 processor is in 
the Stop Grant state, the processor enters the Stop Clock state. 
Because all internal clocks and the PLL are not running in the 
Stop Clock state, the Stop Clock state represents the 
minimum-power state of all clock control states. The CLK signal 
must be held Low while it is stopped. 


The Stop Clock state cannot be entered from the Halt state. 


INTR is the only input signal that is allowed to change states 
while the processor is in the Stop Clock state. However, INTR is 
not sampled until the processor returns to the Stop Grant state. 
All other input signals must remain unchanged in the Stop 
Clock state. 
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Exit Stop Clock State The AMD-K6-2 processor returns to the Stop Grant state from 
the Stop Clock state after the CLK signal is started and the 
internal PLL has stabilized. PLL stabilization is achieved after 
the CLK signal has been running within its specification for a 
minimum of 1.0 ms. 


The frequency of CLK when exiting the Stop Clock state can be 
different than the frequency of CLK when entering the Stop 
Clock state. 


The state of the BF[2:0] signals when exiting the Stop Clock 
state is ignored because the BF[2:0] signals are only sampled 
during the falling transition of RESET. 
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HLT Instruction STPCLK# Asserted 











Normal Mode 
















- Real 
RESET, SMI, INIT, - Virtual-8086 STPCLK# Negated, 
or INTR Asserted - Protected or RESET Asserted 
- SMM 







EADS# Asserted EADS# Asserted 






Stop Grant 
Inquire 
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Stop Grant 
State 












Writeback 
Completed 


Writeback 
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CLK 
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Stop Clock 
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Figure 91. Clock Control State Transitions 
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13.1 


Power Connections 


The AMD-K6-2 processor is a dual voltage device. Two separate 
supply voltages are required: Vcc7 and Vcc3. Vcc2 provides the 
core voltage for the processor and V¢c3 provides the I/O voltage. 
See “Electrical Data” on page 253 for the value and range of 
Vcc2 and Vcc3: 


There are 28 Vcc2, 32 Vcc3, and 68 Vgs pins on the AMD-K6-2 
processor. (See “Pin Designations” on page 297 for all power 
and ground pin designations.) The large number of power and 
ground pins are provided to ensure that the processor and 
package maintain a clean and stable power distribution 
network. 


For proper operation and functionality, all Vcec2, Vec3, and Vss 
pins must be connected to the appropriate planes in the circuit 
board. The power planes have been arranged in a pattern to 
simplify routing and minimize crosstalk on the circuit board. 
The isolation region between two voltage planes must be at 
least 0.254mm if they are in the same layer of the circuit board. 
(See Figure 92 on page 250.) In order to maintain a 
low-impedance current sink and reference, the ground plane 
must never be split. 


Although the AMD-K6-2 processor has two separate supply 
voltages, there are no special power sequencing requirements. 
The best procedure is to minimize the time between which Vcc? 
and Vc¢c3 are either both on or both off. 





Chapter 13 


Power and Grounding 249 


AMD<¢\ 


Preliminary Information 





AMD-K6®-2 Processor Data Sheet 


13.2 


21850)/0—February 2000 







































































> <t— 0.254mm (min.) for 
isolation region 
°0200000000000000000.0 a |(s 
©0000 000000000000 0.0 = o 
0 .0-0.0.0_0-0_0-010;0.0°0.0-0-0.0.0.0 is) o 
0-0-0.0.0.0 0.0 010 0.0.0°.0.0-0.0-0 = 
050400 000009000000 05000 
Boek [eral] [Jef] Le20 | aoe 
aD 
05000 | | ce] 05000 te) 
_| Eee Om | Ceol 22222] 
wo 
02020 02020 iS 
— °o-Oo cc4 O-o = 
050,50 + + [ea] °5°5° 8 
U 
oo.00 ccs Boos 
one) ° 
05050 + ccs taal] 05000 
soon reap] Jol] [Teo] P09 08 
= [s) 
Bl] jogege Heal eae] [Teal] caf] 690°) |e} | | 8 
— F002 525959599995 2G 55 6° OOF 
09 0° OP 0°00 00 Po" 020" 020° 02000 
0000000000000 00 000 
000000000 0100000000 
Vecs (I/O) Plane Vc) (Core) Plane 


Figure 92. Suggested Component Placement 


Decoupling Recommendations 


In addition to the isolation region mentioned in “Power 


Connections” on page 249, 


adequate decoupling capacitance is 


required between the two system power planes and the ground 
plane to minimize ringing and to provide a low-impedance path 
for return currents. Suggested decoupling capacitor placement 


is shown in Figure 92. 


Surface mounted capacitors should be used under the 


processor’s ZIF socket tom 


inimize resistance and inductance in 


the lead lengths while maintaining minimal height. For 
information and recommendations about the specific value, 
quantity, and location of the capacitors, see the AMD-K6® 
Processor Power Supply Design Application Note, order# 21103. 
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13.3 Pin Connection Requirements 
For proper operation, the following requirements for signal pin 
connections must be met: 
= Do not drive address and data signals into large capacitive 
loads at high frequencies. If necessary, use buffer chips to 
drive large capacitive loads. 
Leave all NC (no-connect) pins unconnected. 
=» Unused inputs should always be connected to an 
appropriate signal level. 
¢ Active Low inputs that are not being used should be 
connected to Vcc3 through a 20-kohm pullup resistor. 
¢ Active High inputs that are not being used should be 
connected to GND through a pulldown resistor. 
= Reserved signals can be treated in one of the following ways: 
¢ As no-connect (NC) pins, in which case these pins are left 
unconnected 
¢ As pins connected to the system logic as defined by the 
industry-standard Super7 and Socket 7 interface 
e Any combination of NC and Socket 7 pins 
= Keep trace lengths to a minimum. 
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14 Electrical Data 





This chapter consists of two sections, where each section 
provides the electrical specifications of the AMD-K6-2 
processor based on specific Ordering Part Number (OPN) 
suffixes. See “Ordering Information” on page 301 fora 
complete list and description of the valid OPN combinations. 


14.1 Electrical Data for OPN Suffixes AHX, 400AFQ, and AFR 


Operating Ranges 


The electrical specifications provided in this section pertain to 
the following OPNs: 


m AMD-K6-2/475AHX m AMD-K6-2/400AFQ 


m AMD-K6-2/450AHX 

m AMD-K6-2/380AFR wm AMD-K6-2/366AFR wm AMD-K6-2/350AFR 

m AMD-K6-2/333AFR wm AMD-K6-2/300AFR wm AMD-K6-2/266AFR 

Note: The electrical specifications for the AMD-K6-2/400AFR OPN 
are provided in “Electrical Data for OPN Suffixes AGR, 
AFX, and 400AFR” on page 258. 


The AMD-K6-2 processor is designed to provide functional 
operation if the voltage and temperature parameters are within 
the limits defined in Table 52. 


Table 52. Operating Ranges for OPN Suffixes AHX, 400AFQ, and AFR 






































1 Vecy and Vec3 are referenced from Vs<. 
2. Veco specification for 2.2 V component. 
3. Veco specification for 2.4 V component. 
4. 


Case temperature range required for AMD-K6-2/xxxAFR valid ordering part number 
combinations, where “xxx” represents the processor core frequency. 
5. Case temperature range required for AMD-K6-2/xxxAHX valid ordering part number 
combinations, where “xxx” represents the processor core frequency. 


6. Case temperature range required for AMD-K6-2/xxxAFQ valid ordering part number 
combinations, where “xxx” represents the processor core frequency. 


Parameter Minimum Typical Maximum Comments 
2.1V 2.2V 2.3V Note 1, 2 
Veca 
2.3V 2.4V 2.5V Note 1, 3 
Vecs 3.135 V 3.30V 3.6V Note 1 
70°C Note 4 
TCASE 0°c 65°C Note 5 
60°C Note 6 
Notes: 
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Absolute Ratings The AMD-K6-2 processor is not designed to be operated beyond 
the operating ranges listed in Table 52. Exposure to conditions 
outside these operating ranges for extended periods of time can 
affect long-term reliability. Permanent damage can occur if the 
absolute ratings listed in Table 53 are exceeded. 


Table 53. Absolute Ratings for OPN Suffixes AHX, 400AFQ, and AFR 


























Parameter Minimum Maximum Comments 
Veo -0.5V 2.6V 
Vecs -0.5V 3.6V 
VIN -0.5V a ey ae Note 
Tease (under bias) -65°C +110°C 
TsTORAGE -65°C +150°C 
Note: 
Vpyy (the voltage on any I/O pin) must not be greater than 0.5 V above the voltage being 
applied to Vcc3. In addition, the Vpy voltage must never exceed 4.0 V. 











DC Characteristics The DC characteristics of the AMD-K6-2 processor are shown in 
Table 54. 


Table 54. DC Characteristics for OPN Suffixes AHX, 400AFQ, and AFR 
































Si Preliminary Data 
Symbol Parameter Description Per Comments 
In 

Vit Input Low Voltage 

Vin Input High Voltage Note 1 

VoL Output Low Voltage Io, = 4.0-mA load 
Vou Output High Voltage lon = 3.0-mA load 

Notes. 

1. Vecg refers to the voltage being applied to Vccs during functional operation. 

2. Vecg= 2.3 V— The maximum power supply current must be taken into account when designing a power supply. 

3. Vecg=2.5 V— The maximum power supply current must be taken into account when designing a power supply. 

4. Vccz=3.6 V— The maximum power supply current must be taken into account when designing a power supply. 

5, Refers to inputs and |/O without an internal pullup resistor and 0 S Viy S Vecs, 

6. Refers to inputs with an internal pullup and Vi, =0.4 V. 

7. Refers to inputs with an internal pulldown and Vj,,= 2.4 V. 

8. This specification applies to components using a CLK frequency of 66 MHz. 

9. This specification applies to components using a CLK frequency of 95 MHz. 

10. This specification applies to components using a CLK frequency of 100 MHz. 
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Table 54. DC Characteristics for OPN Suffixes AHX, 400AFQ, and AFR (continued) 


































































































aos Preliminary Data 
Symbol Parameter Description - Comments 
Min Max 
735A 266 MHz, Note 2, 8 
8.45A 300 MHz, Note 2, 8, 10 
9.40A 333 MHz, Note 2, 8, 9 
2.2 V Power Supply Current 9.85A 350 MHz, Note 2, 10 
Iec2 10.30A 366 MHz, Note 2, 8 
10.70 A 380 MHz, Note 2, 9 
11.25A 400 MHz, Note 2, 8, 10 
12.50 A 450 MHz, Note 3, 10 
2.4 V Power Supply Current 
13.00 A 475 MHz, Note 3, 9 
0.54A 266 MHz, Note 4, 8 
0.56 A 300 MHz, Note 4, 8, 10 
0.58 A 333 MHz, Note 4, 8, 9 
0.60 A 350 MHz, Note 4, 10 
lcc3 3.3 V Power Supply Current 0.60 A 366 MHz, Note 4, 8 
0.61 A 380 MHz, Note 4, 9 
0.62 A 400 MHz, Note 4, 8, 10 
0.66A 450 MHz, Note 4, 10 
0.67 A 475 MHz, Note 4, 9 
ly Input Leakage Current +15 WA Note 5 
lio Output Leakage Current +15 WA Note 5 
lit Input Leakage Current Bias with Pullup -400 nA Note 6 
lin Input Leakage Current Bias with Pulldown 200 LA Note 7 
Cin Input Capacitance 10 pF 
Court Output Capacitance 15 pF 
Notes. 
1. Vecg refers to the voltage being applied to V¢c3 during functional operation. 
2. Vec@=2.3 V— The maximum power supply current must be taken into account when designing a power supply. 
3, Vecp=2.5 V— The maximum power supply current must be taken into account when designing a power supply. 
4. Vec3=3.6 V— The maximum power supply current must be taken into account when designing a power supply. 
5, Refers to inputs and |/O without an internal pullup resistor and 0 S Viy S Vecs, 
6. Refers to inputs with an internal pullup and Vj), =0.4 V. 
7. Refers to inputs with an internal pulldown and Viy= 2.4 V. 
8. This specification applies to components using a CLK frequency of 66 MHz. 
9. This specification applies to components using a CLK frequency of 95 MHz. 
10. This specification applies to components using a CLK frequency of 100 MHz. 
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Table 54. DC Characteristics for OPN Suffixes AHX, 400AFQ, and AFR (continued) 









































ats Preliminary Data 
Symbol Parameter Description - Comments 
Min Max 

Court I/O Capacitance 20 pF 

Cok CLK Capacitance 10 pF 

Chin Test Input Capacitance (TDI, TMS, TRST#) 10 pF 

Crout Test Output Capacitance (TDO) 15 pF 

Cick TCK Capacitance 10 pF 

Notes: 

1. Vecg refers to the voltage being applied to Vec3 during functional operation. 
2. Vecg=2.3 V— The maximum power supply current must be taken into account when designing a power supply. 
3. Vecg=2.5 V— The maximum power supply current must be taken into account when designing a power supply. 
4. Vec3 = 3.6 V— The maximum power supply current must be taken into account when designing a power supply. 
5. Refers to inputs and |/O without an internal pullup resistor and 0 S Viy S Vecs 
6. Refers to inputs with an internal pullup and Vj, = 0.4 V. 
7. Refers to inputs with an internal pulldown and Vy= 2.4 V. 
8. This specification applies to components using a CLK frequency of 66 MHz. 
9. This specification applies to components using a CLK frequency of 95 MHz. 
10. This specification applies to components using a CLK frequency of 100 MHz. 
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Power Dissipation Table 55 contains the typical and maximum power dissipation 


of the AMD-K6-2 processor during normal and reduced power 
states. 


Table 55. Typical and Maximum Power Dissipation for OPN Suffixes AHX, 400AFQ, and AFR 





Clock Control 266 300 333 350 366 380 400 450 475 












































1. 


PNDOA SA 





Notes 
State MHz° | MHz°* | MHz®? | MHz® | MHz® | MHz’ | MHz°® | MHz® | MHz’ 
Thermal Power | 14 79 w | 1720 W | 19.00 W| 19.95 W| 20.80 W| 21.60 W | 22.70 W| 28.40 W|29.60w! 1,2 
(Maximum) 
ee ie 8.85W | 1035W| 11.40W | 11.98 W | 12.48W| 12.95W) 13.65W| 1705W/ 1775W| 3 
Stop Grant/Halt | 5 oo w | sow | 3o4w | 396w | 396w | 397W1398W1650W| 651IW| 4 
(Maximum) 
Stop Clock | 5 sow | 350w | 350w | 3.50W | 350w | 350w | 3.50W| 600w| eoow| 5 
(Maximum) 
Notes: 


The maximum power dissipated in the normal clock control state must be taken into account when designing a solution for thermal 
dissipation for the AMD-K6-2 processor. 


Maximum power is determined for the worst-case instruction sequence or function for the listed clock control states with 
Vecg= 2.2 V (for the 2.2 Vcomponent) or Vcc = 2.4 V (for the 2.4 V component) and V¢cz = 3.3 V. 


Typical power is determined for the typical instruction sequences or functions associated with normal system operation with 
Voc2= 2.2 V (for the 2.2 V component) or Vccy = 2.4 V (for the 2.4 V component) and Vec3 =3.3 V. 


The CLK signal and the internal PLL are still running but most internal clocking has stopped. 
The CLK signal, the internal PLL, and all internal clocking has stopped. 

This specification applies to components using a CLK frequency of 66 MHz. 

This specification applies to components using a CLK frequency of 95 MHz. 

This specification applies to components using a CLK frequency of 100 MHz. 
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14.2 Electrical Data for OPN Suffixes AGR, AFX, and 400AFR 


Operating Ranges 


The electrical specifications provided in this section pertain to 
the following OPNs: 


= AMD-K6-2/500AFX 
= AMD-K6-2/400AFR 


m AMD-K6-2/550AGR m AMD-K6-2/533AFX 

m AMD-K6-2/475AFX om AMD-K6-2/450AFX 

Note: The electrical specifications for all frequencies of the OPN 
suffix AFR other than 400 MHz are provided in “Electrical 
Data for OPN Suffixes AHX, 400AFQ, and AFR” on 
page 253. 


The AMD-K6-2 processor is designed to provide functional 
operation if the voltage and temperature parameters are within 
the limits defined in Table 56. 


Table 56. Operating Ranges for OPN Suffixes AGR, AFX, and 400AFR 
































~ 





Vocg and Vec3 are referenced from Vs<, 

2. Vec specification for 2.2 V components. 

3. Vec specification for 2.3 V components. 

4. Case temperature range required for AMD-K6-2/550AGR and AMD-K6-2/400AFR ordering 
part numbers. 

5. Case temperature range required for AMD-K6-2/xxxAHX valid ordering part number 

combinations, where “xxx” represents the processor core frequency. 


Parameter Minimum Typical Maximum Comments 
Vee 2.1V 2.2V 2.3V Note 1, 2 
Vcc? 2.2V 2.3V 2.4V Note 1, 3 
Vcc3 3.135 V 3.30V 3.6V Note 1 
Tcase o°c 70°C Note 4 
Tcase 0°c 65°C Note 5 
Notes: 
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Absolute Ratings The AMD-K6-2 processor is not designed to be operated beyond 
the operating ranges listed in Table 56. Exposure to conditions 
outside these operating ranges for extended periods of time can 
affect long-term reliability. Permanent damage can occur if the 


absolute ratings listed in Table 57 are exceeded. 


Table 57. Absolute Ratings for OPN Suffixes AGR, AFX, and 400AFR 
































Parameter Minimum Maximum Comments 
Veo -0.5V 2.5V 
Vecs -0.5V 3.6V 
VIN -0.5V a Hy ane Note 
Tease (under bias) -65°C +110°C 
TsTORAGE -65°C +150°C 
Note: 
Vpyy (the voltage on any I/O pin) must not be greater than 0.5 V above the voltage being 
applied to Vcc3. In addition, the Vpyy voltage must never exceed 4.0 V. 








DC Characteristics The DC characteristics of the AMD-K6-2 processor are shown in 


Table 58. 
Table 58. DC Characteristics for OPN Suffixes AGR, AFX, and 400AFR 





























10. This specification applies to components using a CLK frequency of 100 MHz. 
11. This specification applies to components using a CLK frequency of 97 MHz. 
12. The specifications provided for the 533 MHz component are identical to the specifications of the 500 MHz component. 





3.03 Preliminary Data 
Symbol Parameter Description Mi Comments 
In 
Vit Input Low Voltage 
Vin Input High Voltage Note 1 
VoL Output Low Voltage Io, = 4.0-mA load 
Vou Output High Voltage loy = 3-0-mA load 
Notes: 
1. Vecg refers to the voltage being applied to V¢c3 during functional operation. 
2. Vecg=2.3 V— The maximum power supply current must be taken into account when designing a power supply. 
3. Vecg=2.4V— The maximum power supply current must be taken into account when designing a power supply. 
4. Vec3 =3.6 V— The maximum power supply current must be taken into account when designing a power supply. 
5. Refers to inputs and |/O without an internal pullup resistor and 0 S Viy S Vcc 
6. Refers to inputs with an internal pullup and Vj, = 0.4 V. 
7. Refers to inputs with an internal pulldown and Vy = 2.4 V. 
8. This specification applies to components using a CLK frequency of 66 MHz. 
9. This specification applies to components using a CLK frequency of 95 MHz. 
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Table 58. DC Characteristics for OPN Suffixes AGR, AFX, and 400AFR (continued) 




























































































as Preliminary Data 
Symbol Parameter Description - Comments 
Min Max 
10.00 A 400 MHz, Note 2, 8, 10 
11.25A 450 MHz, Note 2, 10 
, 2.2 V Power Supply Current 11.90 A 475 MHz, Note 2, 9 
Be, 500 MHz, Note 2, 10 
12.50 A 
533 MHz, Note 2, 11, 12 
2.3 V Power Supply Current 13.00 A 550 MHz, Note 3, 10 
0.62 A 400 MHz, Note 4, 8, 10 
0.66A 450 MHz, Note 4, 10 
0.67 A 475 MHz, Note 4, 9 
Icc3 3.3 V Power Supply Current 
500 MHz, Note 4, 10 
0.69 A 533 MHz, Note 4, 11, 12 
550 MHz, Note 4, 10 
ly Input Leakage Current +15 WA Note 5 
llo Output Leakage Current +15 WA Note 5 
lit Input Leakage Current Bias with Pullup -400 LA Note 6 
liq Input Leakage Current Bias with Pulldown 200 LA Note 7 
Cin Input Capacitance 10 pF 
Court Output Capacitance 15 pF 
Court \/O Capacitance 20 pF 
Cok CLK Capacitance 10 pF 
Cin Test Input Capacitance (TDI, TMS, TRST#) 10 pF 
Crout Test Output Capacitance (TDO) 15 pF 
Notes: 
1. Vecg refers to the voltage being applied to V¢c3 during functional operation. 
2. Vecg=2.3 V— The maximum power supply current must be taken into account when designing a power supply. 
3, Vecp=2.4 V— The maximum power supply current must be taken into account when designing a power supply. 
4. Vec3 = 3.6 V— The maximum power supply current must be taken into account when designing a power supply. 
5, Refers to inputs and |/O without an internal pullup resistor and 0 S Viy S Vecs, 
6. Refers to inputs with an internal pullup and Vj, =0.4 V. 
7. Refers to inputs with an internal pulldown and Vy = 2.4 V. 
8. This specification applies to components using a CLK frequency of 66 MHz. 
9. This specification applies to components using a CLK frequency of 95 MHz. 
10. This specification applies to components using a CLK frequency of 100 MHz. 
11. This specification applies to components using a CLK frequency of 97 MHz. 
12. The specifications provided for the 533 MHz component are identical to the specifications of the 500 MHz component. 
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Table 58. DC Characteristics for OPN Suffixes AGR, AFX, and 400AFR (continued) 





























a3 Preliminary Data 
Symbol Parameter Description - Comments 
Min Max 
Cick TCK Capacitance 10 pF 
Notes. 
1. Vecg refers to the voltage being applied to Vec3 during functional operation. 
2. Vecg=2.3 V— The maximum power supply current must be taken into account when designing a power supply. 
3. Vecg=2.4 V— The maximum power supply current must be taken into account when designing a power supply. 
4. Vec3=3.6 V— The maximum power supply current must be taken into account when designing a power supply. 
5. Refers to inputs and |/O without an internal pullup resistor and 0 S Viy S Vc 
6. Refers to inputs with an internal pullup and Vj, =0.4 V. 
7. Refers to inputs with an internal pulldown and Vy = 2.4 V. 
8. This specification applies to components using a CLK frequency of 66 MHz. 
9. This specification applies to components using a CLK frequency of 95 MHz. 
10. This specification applies to components using a CLK frequency of 100 MHz. 
11. This specification applies to components using a CLK frequency of 97 MHz. 
12. The specifications provided for the 533 MHz component are identical to the specifications of the 500 MHz component. 
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Table 59 contains the typical and maximum power dissipation 


of the AMD-K6-2 processor during normal and reduced power 


states. 


Table 59. Typical and Maximum Power Dissipation for OPN Suffixes AGR, AFX, and 400AFR 



































1. 


2. 


WO MNDH SA 





Clock Control | 400 450 475 500 533 550 | ates 
State MHz°® | MHz? | MHz’ | MHz® | MHz?"°) MHz® 
Thermal Power | 16 99 w | 18.80 W | 19.80 W 20.75 W 25.00W| 1,2 
(Maximum) 
Thermal Power | 915 w | 11.30W | 11.90W 12.45 W 5.00w| 3 
(Typical) 
Stop Grant /Halt | 4 ayw | qaaw | 4.45 W 4.46 W 4.87 W 4 
(Maximum) 
Stop Clock | goow | 4.00w | 4.00 W 4.00 W 4.37 W 5 
(Maximum) 
Notes: 


The maximum power dissipated in the normal clock control state must be taken into account when 
designing a solution for thermal dissipation for the AMD-K6-2 processor. 


Maximum power is determined for the worst-case instruction sequence or function for the listed 
clock control states with V¢cy= 2.2 V (for 2.2 Vcomponents) or Vecy= 2.3 V (for 2.3 V components) 
and Viec3 =33V. 


Typical power is determined for the typical instruction sequences or functions associated with 
normal system operation with Vecy = 2.2 V (for 2.2 V components) or Vecp = 2.3 V (for 2.3 V 
components) and Vecz = 3.3 V. 


The CLK signal and the internal PLL are still running but most internal clocking has stopped. 
The CLK signal, the internal PLL, and all internal clocking has stopped. 

This specification applies to components using a CLK frequency of 66 MHz. 

This specification applies to components using a CLK frequency of 95 MHz. 

This specification applies to components using a CLK frequency of 100 MHz. 

This specification applies to components using a CLK frequency of 97 MHz. 


. The specifications provided for the 533 MHz component are identical to the specifications of the 500 


MHz component 
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1/0 Buffer Characteristics 





All of the AMD-K6-2 processor inputs, outputs, and 
bidirectional buffers are implemented using a 3.3V buffer 
design. In addition, a subset of the processor I/O buffers include 
a second, higher drive strength option. These buffers can be 
configured to provide the higher drive strength for applications 
that place a heavier load on these I/O signals. 


AMD has developed two I/O buffer models that represent the 
characteristics of each of the two possible drive strength 
configurations supported by the AMD-K6-2 processor. These 
two models are called the Standard I/O Model and the Strong 
I/O Model. 


AMD developed the two models to allow system designers to 
perform analog simulations of AMD-K6-2 processor signals that 
interface with the system logic. Analog simulations are used to 
determine a signal’s time of flight from source to destination 
and to ensure that the system’s signal quality requirements are 
met. Signal quality measurements include overshoot, 
undershoot, slope reversal, and ringing. 


Selectable Drive Strength 


The AMD-K6-2 processor samples the BRDYC# input during the 
falling transition of RESET to configure the drive strength of 
A[20:3], ADS#, HITM# and W/R#. If BRDYC# is 0 during the fall 
of RESET, these particular outputs are configured using the 
higher drive strength. If BRDYC# is 1 during the fall of RESET, 
the standard drive strength is selected for all I/O buffers. 


Table 60 shows the relationship between BRDYC# and the two 
available drive strengths — K6STD and K6STG. 


Table 60. A[20:3], ADS#, HITM#, and W/R# Strength Selection 


























Drive Strength BRDYC# 1/0 Buffer Name 
Strength 1 (standard) 1 K6STD 
Strength 2 (strong) 0 K6STG 
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15.2 I/O Buffer Model 


AMD provides models of the AMD-K6-2 processor I/O buffers 
for system designers to use in board-level simulations. These I/O 
buffer models conform to the I/O Buffer Information 
Specification (IBIS). The Standard I/O Model uses K6STD, the 
standard I/O buffer representation, for all I/O buffers. The 
Strong I/O Model uses K6STG, the stronger I/O buffer 
representation for A[20:3], ADS#, HITM#, and W/R#, and uses 
K6STD for the remainder of the I/O buffers. 


Both I/O models contain voltage versus current (V/I) and 
voltage versus time (V/T) data tables for accurate modeling of 
I/O buffer behavior. 


The following list characterizes the properties of each I/O 
buffer model: 


m All data tables contain minimum, typical, and maximum 
values to allow for worst-case, typical, and best-case 
simulations, respectively. 


m The pullup, pulldown, power clamp, and ground clamp 
device V/I tables contain enough data points to accurately 
represent the nonlinear nature of the V/I curves. In addition, 
the voltage ranges provided in these tables extend beyond 
the normal operating range of the AMD-K6-2 processor for 
those simulators that yield more accurate results based on 
this wider range. Figure 93 and Figure 94 on page 265 
illustrate the min/typ/max pulldown and pullup V/I curves 
for K6STD between OV and 3.3 V. 


The rising and falling ramp rates are specified. 


The min/typ/max Vcc3 operating range is specified as 
3.135 V, 3.3 V, and 3.6V, respectively. 


Vj = 0.8V, Vip, = 2.0 V, and Vineas = 1.5V 
The R/L/C of the package is modeled. 
The capacitance of the silicon die is modeled. 


The model assumes a test load resistance of 500. 
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Figure 93. K6STD Pulldown V/I Curves 
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Figure 94. K6STD Pullup V/I Curves 


15.3 I/O Model Application Note 


For the AMD-K6-2 processor I/O Buffer IBIS Models and their 
application, refer to the AMD-K6® Processor I/O Model (IBIS) 
Application Note, order# 21084. 


15.4 I/O Buffer AC and DC Characteristics 


See “Signal Switching Characteristics” on page 267 for the 
AMD-K6-2 processor AC timing specifications. 


See “Electrical Data” on page 253 for the AMD-K6-2 processor 
DC specifications. 
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Signal Switching Characteristics 





The AMD-K6-2 processor signal switching characteristics are 
presented in Table 61 through Table 70. Valid delay, float, 
setup, and hold timing specifications are listed. These 
specifications are provided for the system designer to 
determine if the timings necessary for the processor to 
interface with the system logic are met. Table 61 and Table 62 
contain the switching characteristics of the CLK input. Table 63 
through Table 66 contain the timings for the normal operation 
signals. Table 67 and Table 68 contain the timings for RESET 
and the configuration signals. Table 69 and Table 70 contain the 
timings for the test operation signals. 


All signal timings provided are: 


m Measured between CLK, TCK, or RESET at 1.5 V and the 
corresponding signal at 1.5 V—this applies to input and 
output signals that are switching from Low to High, or from 
High to Low 

= Based on input signals applied at a slew rate of 1 V/ns 
between 0 V and 3 V (rising) and 3 V to 0 V (falling) 

m Valid within the operating ranges given in “Operating 
Ranges” on page 253 


m Based on a load capacitance (C;) of 0 pF 


CLK Switching Characteristics 


Table 61 and Table 62 contain the switching characteristics of 
the CLK input to the AMD-K6-2 processor for 100-MHz and 
66-MHz bus operation, respectively, as measured at the voltage 
levels indicated by Figure 95 on page 269. 


The CLK Period Stability specifies the variance (jitter) allowed 
between successive periods of the CLK input measured at 1.5 V. 
This parameter must be considered as one of the elements of 
clock skew between the AMD-K6-2 processor and the system 
logic. 
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16.2 Clock Switching Characteristics for 100-MHz Bus Operation 


Table 61. CLK Switching Characteristics for 100-MHz Bus Operation 












































Symbol Parameter Description a sli Comments 

Frequency 33.3 MHz 100 MHz In Normal Mode 

t CLK Period 10.0 ns In Normal Mode 

t, CLK High Time 3.0 ns 

tz CLK Low Time 3.0 ns 

ty CLK Fall Time 0.15 ns 

ts CLK Rise Time 0.15 ns 
CLK Period Stability Note 

Note: 
Jitter frequency power spectrum peaking must occur at frequencies greater than (Frequency of CLK)/3 or less than 500 kHz. 








16.3 Clock Switching Characteristics for 66-MHz Bus Operation 


Table 62. CLK Switching Characteristics for 66-MHz Bus Operation 















































Symbol Parameter Description _ wil Comments 

Frequency 33.3 MHz 66.6 MHz In Normal Mode 

t CLK Period 15.0 ns 30.0 ns In Normal Mode 

t, CLK High Time 4.0 ns 

tz CLK Low Time 4.0 ns 

ty CLK Fall Time 0.15 ns 

t, CLK Rise Time 0.15 ns 
CLK Period Stability Note 

Note: 
Jitter frequency power spectrum peaking must occur at frequencies greater than (Frequency of CLK)/3 or less than 500 kHz. 
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Figure 95. CLK Waveform 
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16.4 Valid Delay, Float, Setup, and Hold Timings 


Valid delay and float timings are given for output signals during 
functional operation and are given relative to the rising edge of 
CLK. During boundary-scan testing, valid delay and float 
timings for output signals are with respect to the falling edge of 
TCK. The maximum valid delay timings are provided to allow a 
system designer to determine if setup times to the system logic 
can be met. Likewise, the minimum valid delay timings are used 
to analyze hold times to the system logic. 


The setup and hold time requirements for the AMD-K6-2 
processor input signals must be met by the system logic to 
assure the proper operation of the AMD-K6-2 processor. The 
setup and hold timings during functional and boundary-scan 
test mode are given relative to the rising edge of CLK and TCK, 
respectively. 
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Output Delay Timings for 100-MHz Bus Operation 


Table 63. Output Delay Timings for 100-MHz Bus Operation 
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Symbol Parameter Description _" ssc Figure Comments 
tg A[31:3] Valid Delay 97 
ty A[31:3] Float Delay 98 
ts ADS# Valid Delay 97 
ty ADS# Float Delay 98 
tho ADSC# Valid Delay 97 
ty ADSC# Float Delay 98 
tia AP Valid Delay 97 
ty AP Float Delay 98 
ti APCHK# Valid Delay 97 
tis BE[7:0]# Valid Delay 97 
tig BE[7:0]# Float Delay 98 
ty BREQ Valid Delay 97 
tig CACHE# Valid Delay 97 
tig CACHE# Float Delay 98 
ty D/C# Valid Delay 97 
ty D/C# Float Delay 98 
ty D[63:0] Write Data Valid Delay 97 
th; D[63:0] Write Data Float Delay 98 
to4 DP[7:0] Write Data Valid Delay 97 
ths DP[7:0] Write Data Float Delay 98 
to FERR# Valid Delay 97 
th7 HIT# Valid Delay 97 
tog HITM# Valid Delay 97 
too HLDA Valid Delay 97 
tz LOCK# Valid Delay 97 
tz) LOCK# Float Delay 98 
tz M/lO# Valid Delay 97 
tyz M/lO# Float Delay 98 
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Table 63. Output Delay Timings for 100-MHz Bus Operation (continued) 






































Symbol Parameter Description preinnaly aus Figure Comments 
Min Max 
tz4 PCD Valid Delay 97 
tz5 PCD Float Delay 98 
tz6 PCHK# Valid Delay 97 
t37 PWT Valid Delay 97 
tz PWT Float Delay 98 
tz SCYC Valid Delay 97 
tao SCYC Float Delay 98 
ty SMIACT# Valid Delay 97 
tap W/R# Valid Delay 97 
taz W/R# Float Delay 98 
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16.6 Input Setup and Hold Timings for 100-MHz Bus Operation 


Table 64. Input Setup and Hold Timings for 100-MHz Bus Operation 


































































































Symbol Parameter Description _" zl Comments 

tag A[31:5] Setup Time 

tas A[31:5] Hold Time 

tas A20M# Setup Time Note 1 

ta A20M# Hold Time Note 1 

tag AHOLD Setup Time 

tag AHOLD Hold Time 

ts AP Setup Time 

ts AP Hold Time 

ts BOFF# Setup Time 

ts3 BOFF# Hold Time 

te4 BRDY# Setup Time 

tes BRDY# Hold Time 

t5¢ BRDYC# Setup Time 

ts7 BRDYC# Hold Time 

tsg D[63:0] Read Data Setup Time 

ts9 D[63:0] Read Data Hold Time 

téo DP[7:0] Read Data Setup Time 

t6 DP[7:0] Read Data Hold Time 

te2 EADS# Setup Time 

tez EADS# Hold Time 

tea EWBE# Setup Time 

tes EWBE# Hold Time 

tes FLUSH# Setup Time Note 2 

te7 FLUSH# Hold Time Note 2 

Notes: 

1. These level-sensitive signals can be asserted synchronously or asynchronously. To be sampled on a specific clock edge, setup and 

hold times must be met. If asserted asynchronously, they must be asserted for a minimum pulse width of two clocks. 
2. These edge-sensitive signals can be asserted synchronously or asynchronously. To be sampled on a specific clock edge, setup and 

hold times must be met. If asserted asynchronously, they must have been negated at least two clocks prior to assertion and must 
remain asserted at least two clocks. 
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Table 64. Input Setup and Hold Timings for 100-MHz Bus Operation (continued) 
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Symbol Parameter Description preennaly wal Figure Comments 
Min Max 

teg HOLD Setup Time 99 

teg HOLD Hold Time 99 

t7o IGNNE# Setup Time 99 Note 1 

ty IGNNE# Hold Time 99 Note 1 

ty INIT Setup Time 99 Note 2 

ty; INIT Hold Time 99 Note 2 

tq INTR Setup Time 99 Note 1 

ts INTR Hold Time 99 Note 1 

ty6 INV Setup Time 99 

t77 INV Hold Time 99 

tz KEN# Setup Time 99 

ty9 KEN# Hold Time 99 

tgo NA# Setup Time 99 

tg) NA# Hold Time 99 

tgp NMI Setup Time 99 Note 2 

tgs NMI Hold Time 99 Note 2 

tga SMI# Setup Time 99 Note 2 

tgs SMI# Hold Time 99 Note 2 

tg¢ STPCLK# Setup Time 99 Note 1 

tg7 STPCLK# Hold Time 99 Note 1 

tgg WB/WT# Setup Time 99 

tg9 WB/WT# Hold Time 99 

Notes: 

1. These level-sensitive signals can be asserted synchronously or asynchronously. To be sampled on a specific clock edge, setup and 

hold times must be met. If asserted asynchronously, they must be asserted for a minimum pulse width of two clocks. 
2. These edge-sensitive signals can be asserted synchronously or asynchronously. To be sampled on a specific clock edge, setup and 

hold times must be met. If asserted asynchronously, they must have been negated at least two clocks prior to assertion and must 
remain asserted at least two clocks. 
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Output Delay Timings for 66-MHz Bus Operation 


Table 65. Output Delay Timings for 66-MHz Bus Operation 
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Symbol Parameter Description _" zich Figure Comments 
te A[31:3] Valid Delay 97 
ty A[31:3] Float Delay 98 
tg ADS# Valid Delay 97 
ty ADS# Float Delay 98 
tio ADSC# Valid Delay 97 
ty ADSC# Float Delay 98 
tia AP Valid Delay 97 
ty AP Float Delay 98 
ti APCHK# Valid Delay 97 
tis BE[7:0]# Valid Delay 97 
tie BE[7:0]# Float Delay 98 
ty BREQ Valid Delay 97 
tig CACHE# Valid Delay 97 
tig CACHE# Float Delay 98 
ty D/C# Valid Delay 97 
ty D/C# Float Delay 98 
ty D[63:0] Write Data Valid Delay 97 
th; D[63:0] Write Data Float Delay 98 
to4 DP[7:0] Write Data Valid Delay 97 
ths DP[7:0] Write Data Float Delay 98 
to FERR# Valid Delay 97 
th7 HIT# Valid Delay 97 
tog HITM# Valid Delay 97 
too HLDA Valid Delay 97 
tz LOCK# Valid Delay 97 
tz) LOCK# Float Delay 98 
tz M/lO# Valid Delay 97 
tyz M/l0# Float Delay 98 
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Table 65. Output Delay Timings for 66-MHz Bus Operation (continued) 





















































Symbol Parameter Description perenne au Figure Comments 
Min Max 
tz4 PCD Valid Delay 97 
tz5 PCD Float Delay 98 
tz6 PCHK# Valid Delay 97 
t37 PWT Valid Delay 97 
tz PWT Float Delay 98 
tz SCYC Valid Delay 97 
tao SCYC Float Delay 98 
ty SMIACT# Valid Delay 97 
tap W/R# Valid Delay 97 
taz W/R# Float Delay 98 
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16.8 Input Setup and Hold Timings for 66-MHz Bus Operation 


Table 66. Input Setup and Hold Timings for 66-MHz Bus Operation 


































































































Symbol Parameter Description _" zich Comments 

tag A[31:5] Setup Time 

tas A[31:5] Hold Time 

tas A20M# Setup Time Note 1 

ta A20M# Hold Time Note 1 

tag AHOLD Setup Time 

tag AHOLD Hold Time 

ts AP Setup Time 

ts AP Hold Time 

ts BOFF# Setup Time 

ts3 BOFF# Hold Time 

te4 BRDY# Setup Time 

tes BRDY# Hold Time 

t5¢ BRDYC# Setup Time 

ts7 BRDYC# Hold Time 

tsg D[63:0] Read Data Setup Time 

ts9 D[63:0] Read Data Hold Time 

téo DP[7:0] Read Data Setup Time 

t6 DP[7:0] Read Data Hold Time 

te2 EADS# Setup Time 

tez EADS# Hold Time 

tea EWBE# Setup Time 

tes EWBE# Hold Time 

tés FLUSH# Setup Time Note 2 

te7 FLUSH# Hold Time Note 2 

Notes: 

1. These level-sensitive signals can be asserted synchronously or asynchronously. To be sampled on a specific clock edge, setup and 

hold times must be met. If asserted asynchronously, they must be asserted for a minimum pulse width of two clocks. 
2. These edge-sensitive signals can be asserted synchronously or asynchronously. To be sampled on a specific clock edge, setup and 

hold times must be met. If asserted asynchronously, they must have been negated at least two clocks prior to assertion and must 
remain asserted at least two clocks. 
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Table 66. Input Setup and Hold Timings for 66-MHz Bus Operation (continued) 
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Symbol Parameter Description breinunaly ntl Figure Comments 
Min Max 

teg HOLD Setup Time 99 

teg HOLD Hold Time 99 

t7o IGNNE# Setup Time 99 Note 1 

ty IGNNE# Hold Time 99 Note 1 

ty INIT Setup Time 99 Note 2 

ty; INIT Hold Time 99 Note 2 

ty4 INTR Setup Time 99 Note 1 

ts INTR Hold Time 99 Note 1 

ty6 INV Setup Time 99 

t77 INV Hold Time 99 

tz KEN# Setup Time 99 

ty9 KEN# Hold Time 99 

tgo NA# Setup Time 99 

tg) NA# Hold Time 99 

tgp NMI Setup Time 99 Note 2 

tgs NMI Hold Time 99 Note 2 

tga SMI# Setup Time 99 Note 2 

tgs SMI# Hold Time 99 Note 2 

tg¢ STPCLK# Setup Time 99 Note 1 

tg7 STPCLK# Hold Time 99 Note 1 

tgg WB/WT# Setup Time 99 

tg9 WB/WT# Hold Time 99 

Notes: 

1. These level-sensitive signals can be asserted synchronously or asynchronously. To be sampled on a specific clock edge, setup and 

hold times must be met. If asserted asynchronously, they must be asserted for a minimum pulse width of two clocks. 
2. These edge-sensitive signals can be asserted synchronously or asynchronously. To be sampled on a specific clock edge, setup and 

hold times must be met. If asserted asynchronously, they must have been negated at least two clocks prior to assertion and must 
remain asserted at least two clocks. 
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RESET and Test Signal Timing 


Table 67. RESET and Configuration Signals for 100-MHz Bus Operation 
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Symbol Parameter Description Min Comments 
too RESET Setup Time 1.7 ns 
to RESET Hold Time 1.0 ns 
toy | RESET Pulse Width, Vcc and CLK Stable 15 clocks 
ty; | RESET Active After Vcc and CLK Stable 1.0 ms 
tog | BF[2:0] Setup Time 1.0 ms Note 3 
tos | BF[2:0] Hold Time 2 clocks Note 3 
tog BRDYC# Hold Time 1.0 ns Note 4 
to7 BRDYC# Setup Time 2 clocks Note 2 
tog BRDYC# Hold Time 2 clocks Note 2 
tog FLUSH# Setup Time 1.7 ns Note 1 
tioo | FLUSH# Hold Time 1.0 ns Note 1 
tio FLUSH# Setup Time 2 clocks Note 2 
tio2 | FLUSH# Hold Time 2 clocks Note 2 
Notes: 
1. To be sampled on a specific clock edge, setup and hold times must be met the clock edge before the clock edge on which RESET 
is sampled negated. 

2. hor asynchronously, these signals must meet a minimum setup and hold time of two clocks relative to the negation of 
3. BF[2:0] must meet a minimum setup time of 1.0 ms and a minimum hold time of two clocks relative to the negation of RESET. 
4. If RESET is driven synchronously, BRDYC# must meet the specified hold time relative to the negation of RESET. 
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Table 68. RESET and Configuration Signals for 66-MHz Bus Operation 







































































aoe Preliminary Data : 
Symbol Parameter Description = Figure Comments 
Min Max 
too RESET Setup Time 5.0 ns 100 
to RESET Hold Time 1.0 ns 100 
to2 | RESET Pulse Width, Vcc and CLK Stable 15 clocks 100 
ty3 | RESET Active After Vcc and CLK Stable 1.0 ms 100 
tog BF[2:0] Setup Time 1.0 ms 100 Note 3 
tgs | BF[2:0] Hold Time 2 clocks 100 Note 3 
tog BRDYC# Hold Time 1.0 ns 100 Note 4 
to7 BRDYC# Setup Time 2 clocks 100 Note 2 
tog BRDYC# Hold Time 2 clocks 100 Note 2 
tog FLUSH# Setup Time 5.0 ns 100 Note 1 
tioo | FLUSH# Hold Time 1.0 ns 100 Note 1 
tio FLUSH# Setup Time 2 clocks 100 Note 2 
tio2 | FLUSH# Hold Time 2 clocks 100 Note 2 
Notes: 
1. To be sampled on a specific clock edge, setup and hold times must be met the clock edge before the clock edge on which RESET 
is sampled negated. 
2. If asserted asynchronously, these signals must meet a minimum setup and hold time of two clocks relative to the negation of 
RESET. 
3. BF[2:0] must meet a minimum setup time of 1.0 ms and a minimum hold time of two clocks relative to the negation of RESET. 
4. If RESET is driven synchronously, BRDYC# must meet the specified hold time relative to the negation of RESET. 
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Table 69. TCK Waveform and TRST# Timing at 25 MHz 
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Symbol Parameter Description Min Comments 
TCK Frequency 
tiog TCK Period 
thos TCK High Time 
tios TCK Low Time 
tios TCK Fall Time Note 1, 2 
tio7 TCK Rise Time Note 1, 2 
thos TRST# Pulse Width Asynchronous 
Notes: 
1. Rise/Fall times can be increased by 1.0 ns for each 10 MHz that TCK Is run below its maximum frequency of 25 MHz. 
2. Rise/Fall times are measured between 0.8 V and 2.0 V. 











Table 70. Test Signal Timing at 25 MHz 




































































Symbol Parameter Description Eremmunayy cea Figure Notes 
Min Max 
tioo TDI Setup Time 103 Note 2 
tio TDI Hold Time 103 Note 2 
tin TMS Setup Time 103 Note 2 
tia TMS Hold Time 103 Note 2 
tis TDO Valid Delay 103 Note 1 
tia TDO Float Delay 103 Note 1 
tis All Outputs (Non-Test) Valid Delay 103 Note 1 
tie All Outputs (Non-Test) Float Delay 103 Note 1 
tin All Inputs (Non-Test) Setup Time 103 Note 2 
tis All Inputs (Non-Test) Hold Time 103 Note 2 
Notes: 
1. Parameter is measured from the TCK falling edge. 
2. Parameter is measured from the TCK rising edge. 
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WAVEFORM INPUTS OUTPUTS 
Must be steady Steady 
WR Can change from Changing from High to Low 
High to Low 
Av Can change Changing from Low to High 
from Low to High 


‘— eas ae Le Changing, State Unknown 












> ( (Does not apply) Center line is high 
impedance state 
Figure 96. Diagrams Key 
Tx Tx 
CLK O15 V 
Max 
ty 
Output Signal Valid n Co Valid n +1 





v=6, 8, 10, 12, 14, 15, 17, 18, 20, 22, 24, 26, 27, 28, 29, 30, 32, 34, 36, 37, 39, 41, 42 


Figure 97. Output Valid Delay Timing 
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Output Signal 





ae 


Min 


v= 6, 8, 10, 12, 15, 18, 20, 22, 24, 30, 32, 34, 37, 39, 42 
f=7,9, 11, 13, 16, 19, 21, 23, 25, 31, 33, 35, 38, 40, 43 


Figure 98. Maximum Float Delay Timing 
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S = 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88 


h=45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77, 79, 81, 83, 85, 87, 89 


Figure 99. Input Setup and Hold Timing 
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Figure 100. Reset and Configuration Timing 
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Figure 101. TCK Waveform 
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Figure 102. TRST# Timing 
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Figure 103. Test Signal Timing Diagram 
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17 Thermal Design 





17.1 Package Thermal Specifications 


The AMD-K6-2 processor operating specification calls for the 
case temperature (T() to be in the range of 0°C to 70°C, 0°C to 
65°C, or 0°C to 60°C. The ambient temperature (T,) is not 
specified as long as the case temperature is not violated. The 
case temperature must be measured on the top center of the 
package. Table 71 and Table 72 show the AMD-K6-2 processor 
thermal specifications for all valid OPN suffixes. 


Table 71. Package Thermal Specification for OPN Suffixes AHX, AFQ, and AFR 





Maximum Thermal Power 





Pic 2.2 V Component 2.4 V Component 


Junction-Case 


266 MHz | 300 MHz | 333 MHz | 350 MHz | 366 MHz | 380 MHz 400 MHz*| 450 MHz | 475 MHz 


1.0 °C/W 14.70W | 1720W | 19.00W | 19.95 W| 20.80 W | 21.60 W | 22.70 W | 28.40 W | 29.60 W 


3.94W | 3.96W | 3.96W | 3.97W | 3.98W | 650W | 6.51 W 





Stop Grant Mode | 3.90W | 3.92W 
3.50W | 3.50W | 3.50W | 3.50W | 3.50W | 3.50W 








3.50W | 6.00W | 6.00W 

















Stop Clock Mode 





0°C-60°C 0°C-65°C 











Tc Case Temperature 0°C-70°C 








Note: 
* Not applicable to OPN AMD-K6-2/400AFR. Refer to Table 72 on page 287 for the AMD-K6-2/400AFR specifications. 
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Table 72. Package Thermal Specification for OPN Suffixes AGR, AFX, and 400AFR 





Maximum Thermal Power 





1c 2.2 V Component 2.3 V Component 
Junction-Case 
400 MHz!) 450 MHz | 475 MHz | 500 MHz [533 MHz” 550 MHz 





























1.0 °C/W 16.90W | 18.80W | 19.80 W 20.75 W 25.00 W 
Stop Grant Mode 440W | 444W | 445W 4.46 W 4.87 W 
Stop Clock Mode 4.00W |} 4.00W | 4.00W 4.00 W 4.37 W 
Tc Case Temperature | 0°C-70°C 0°C-65°C 0°C-70°C 
Notes: 


1. Specifications are applicable to OPN AMD-K6-2/400AFR. 
2. The specifications provided for the 533 MHz component are identical to the specifications of the 500 MHz 
component. 











Figure 104 on page 288 shows the thermal model of a processor 
with a passive thermal solution. The case-to-ambient 
temperature (Tc,) can be calculated from the following 


equation: 


Toa = Pmax * 9c 
= Puax * (Of + Osa) 


Where 
Pax = Maximum Power Consumption 
Oca = Case-to-Ambient Thermal Resistance 


Orr = Interface Material Thermal Resistance 
Sink-to-Ambient Thermal Resistance 
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Thermal 
Temperature Resistance 
(Ambient) CCW) 


Tea 























Figure 104. Thermal Model 


Figure 105 illustrates the case-to-ambient temperature (T¢,) in 
relation to the power consumption (X-axis) and the thermal 
resistance (Y-axis). If the power consumption and case 
temperature are known, the thermal resistance (6¢,) 
requirement can be calculated for a given ambient temperature 
(T,) value. 
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Figure 105. Power Consumption versus Thermal Resistance 





288 Thermal Design Chapter 17 


Preliminary Information AMD 





21850)/0—February 2000 


AMD-K6®-2 Processor Data Sheet 


The thermal resistance of a heatsink is determined by the heat 
dissipation surface area, the material and shape of the 
heatsink, and the airflow volume across the heatsink. In 
general, the larger the surface area the lower the thermal 
resistance. 


The required thermal resistance of a heatsink (@s,) can be 
calculated using the following example: 


Tf: 

Te =:6576 

Ta = 45°C 

Puay = 29.60W at 475MHz 
Then 








Tea : 
eas 4 ‘) 20°C = 0.676 (°C/W) 
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Heat Dissipation Path 


Measuring Case 
Temperature 


Thermal grease is recommended as interface material because 
it provides the lowest thermal resistance (= 0.20°C/W). The 
required thermal resistance (8s5,) of the heatsink in this 
example is calculated as follows: 


Osa a Oca Orr = 0.676 ms 0.20 _ 0.476(°C/W) 


Figure 106 illustrates the heat dissipation path of the processor. 
Due to the lower thermal resistance between the processor die 
junction and case, most of the heat generated by the processor 
is transferred from the top surface of the case. The small 
amount of heat generated from the bottom side of the processor 
where the processor socket blocks the convection can be safely 
ignored. 


Ambient Temperature 


444 


Case temperature 
| 


Thin Lid 





Figure 106. Processor Heat Dissipation Path 


The processor case temperature is measured to ensure that the 
thermal solution meets the processor’s operational 
specification. This temperature should be measured on the top 
center of the package, where most of the heat is dissipated. 
Figure 107 shows the correct location for measuring the case 
temperature. If a heatsink is installed while measuring, the 
thermocouple must be installed into the heatsink via a small 
hole drilled through the heatsink base (for example, 1/16 of an 
inch). The thermocouple is then attached to the base of the 
heatsink and the small hole filled using thermal epoxy, allowing 
the tip of the thermocouple to touch the top of the processor 
case. 
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Thermally Conductive Epoxy 


Thermocouple 


Figure 107. Measuring Case Temperature 


Layout and Airflow Considerations 


Voltage Regulator 


A voltage regulator is required to support the lower voltage 
(3.3 V and lower) to the processor. In most applications, the 
voltage regulator is designed with power transistors. Asa 
result, additional heatsinks are required to dissipate the heat 
from the power transistors. Figure 108 shows the voltage 
regulator placed parallel to the processor with the airflow 
aligned with the devices. With this alignment, the heat 
generated by the voltage regulator has minimal effect on the 
processor. 


Voltage Regulator 


Aa 


ee 


Processor 











Airflow 











Figure 108. Voltage Regulator Placement 





Thermal Design 291 


AMD¢\ 


Preliminary Information 





AMD-K6®-2 Processor Data Sheet 21850J/0—February 2000 


Airflow Management 
in a System Design 


A heatsink and fan combination can deliver much better 
thermal performance than a heatsink alone. More importantly, 
with a fan/sink the airflow requirements in a system design are 
not as critical. A unidirectional heatsink with a fan moves air 
from the top of the heatsink to the side. In this case, the best 
location for the voltage regulator is on the side of the processor 
in the path of the airflow exiting the fan sink (see Figure 109). 
This location guarantees that the heatsinks on both the 
processor and the regulator receive adequate air circulation. 





Ideal areas for voltage regulator 


Figure 109. Airflow for a Heatsink with Fan 


Complete airflow management in a system is important. In 
addition to the volume of air, the path of the air is also 
important. Figure 110 shows the airflow in a dual-fan system. 
The fan in the front end pulls cool air into the system through 
intake slots in the chassis. The power supply fan forces the hot 
air out of the chassis. The thermal performance of the heatsink 
can be maximized if it is located in the shaded area, where it 
receives greatest benefit from this air exchange system. 
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Main Board 
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Figure 110. Airflow Path in a Dual-Fan System 


Figure 111 shows the airflow management in a system using the 
ATX form-factor. The orientation of the power supply fan and 
the motherboard are modified in the ATX platform design. The 
power supply fan pulls cool air through the chassis and across 
the processor. The processor is located near the power supply 
fan, where it can receive adequate airflow without an auxiliary 
fan. The arrangement significantly improves the airflow across 
the processor with minimum installation cost. 
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Figure 111. Airflow Path in an ATX Form-Factor System 
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For more information about thermal design considerations, see 
the AMD-K6® Processor Thermal Solution Design Application 
Note, order# 21085. 
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20 Package Specifications 
20.1 321-Pin Staggered CPGA Package Specification 
Table 73. 321-Pin Staggered CPGA Package Specification 
Millimeters Inches 
Symbol 
Min Max Notes Min Max Notes 

A 49.28 49.78 1.940 1.960 

B 45.59 45.85 1.795 1.805 

C 31.01 32.89 1.221 1.295 

D 44.90 45.10 1.768 1.776 

E 2.91 3.63 0.115 0.143 

F 1.30 1.52 0.051 0.060 

G 3.05 3.30 0.120 0.130 

H 0.43 0.51 0.017 0.020 

M 2.29 2.79 0.090 0.110 

N 1.14 1.40 0.045 0.055 

d 1.52 2.29 0.060 0.090 

e 1.52 2.54 0.060 0.100 

f - 0.13 Flatness - 0.005 Flatness 
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Figure 114. 321-Pin Staggered CPGA Package Specification 
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21 Ordering Information 





Standard AMD-K6®-2 Processor Model 8 Products 


AMD standard products are available in several operating ranges. The ordering part 
number (OPN) is formed by a combination of the elements below. 





AMD-K6-2/550 AGR 


ase Temperature 


Operating Voltage 

F = 2.1 V-2.3 V (Core) / 3.135 V-3.6 V (I/O) 
G = 22V-24 V (Core) / 3.135 V-3.6 V (I/O) 
H = 2.3 V-2.5 V (Core) / 3.135 V-3.6 V (I/O) 


Package Type 
A = 321-pin CPGA 


Performance Rating 
/550 =/533~—*/500 
/475 — /450 ~—‘/400 
/380 [350 ~— [333 
/300  /266 


Family/Core 
AMD-K6-2 





Table 74. Valid Ordering Part Number Combinations 





OPN Package Type Operating Voltage Case Temperature 


; 2.2V-2.4V (Core) 
AMD-K6-2/550AGR 321-pin CPGA 0°C-70°C 
3.135V-3.6V (1/0) 


; 2.1V-2.3V (Core) 
AMD-K6-2/533AFX 321-pin CPGA 0°C-65°C 
3.135V-3.6V (1/0) 


; 2.1V-2.3V (Core) 
AMD-K6-2/500AFX 321-pin CPGA 0°C-65°C 
3.135V-3.6V (1/0) 

















Note: 


This table lists configurations planned to be supported in volume for this device. Consult the local 
pide sales office to confirm availability of specific valid combinations and to check on newly-released 
combinations. 
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Table 74. Valid Ordering Part Number Combinations (continued) 





OPN Package Type Operating Voltage Case Temperature 


; 2.1V-2.3V (Core) 
AMD-K6-2/475AFX 321-pin CPGA 0°C-65°C 
3.135V-3.6V (1/0) 


; 2.3V-2.5V (Core) 
AMD-K6-2/475AHX 321-pin CPGA 0°C-65°C 
3.135V-3.6V (1/0) 


; 2.1V-2.3V (Core) 
AMD-K6-2/450AFX 321-pin CPGA 0°C-65°C 
3.135V-3.6V (1/0) 


; 2.3V-2.5V (Core) 
AMD-K6-2/450AHX 321-pin CPGA 0°C-65°C 
3.135V-3.6V (1/0) 


; 2.1V-2.3V (Core) 
AMD-K6-2/400AFR 321-pin CPGA 0°C-70°C 
3.135V-3.6V (I/O) 


; 2.1V-2.3V (Core) 
AMD-K6-2/400AFQ 321-pin CPGA 0°C-60°C 
3.135V-3.6V (I/O) 


; 2.1V-2.3V (Core) 
AMD-K6-2/380AFR 321-pin CPGA 0°C-70°C 
3.135V-3.6V (1/0) 


; 2.1V-2.3V (Core) 
AMD-K6-2/366AFR 321-pin CPGA 0°C-70°C 
3.135V-3.6V (1/0) 


; 2.1V-2.3V (Core) 
AMD-K6-2/350AFR 321-pin CPGA 0°C-70°C 
3.135V-3.6V (1/0) 


: 2.1V-2.3V (Core) 
AMD-K6-2/333AFR 321-pin CPGA 0°C-70°C 
3.135V-3.6V (I/O) 


; 2.1V-2.3V (Core) 
AMD-K6-2/300AFR 321-pin CPGA 0°C-70°C 
3.135V-3.6V (1/0) 


: 2.1V-2.3V (Core) 
AMD-K6-2/266AFR 321-pin CPGA 0°C-70°C 
3.135V-3.6V (1/0) 


















































Note: 
This table lists configurations planned to be supported in volume for this device. Consult the local 
AMD sales office to confirm availability of specific valid combinations and to check on newly-released 
combinations. 
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